Making sense of numbers and words: Statistical methods

Peter Grimbeek

Contact details

Thoughts about MANOVA

SPSS MANOVA generates various bits of output, including:

Between-Subjects factors

This is a descriptive report of numbers per response category for each independent variable included in the analysis.

Box test of the equality of covariance matrices

In theory (See Tabachnick & Fidell, Using multivariate statistics) this test examines whether variance-covariance matrices within each cell of the design (e.g., male, male & 20-30 yrs) are sampled from the same population variance-covariance matrix. This test is described by Tabachick and Fidell (see above) as notoriously sensitive, and so tends both to report statistically significant results and to be ignored.

To be fair, I have in fact seen this test report statistically non-significant outcomes, especially where cell sizes (N per cell) are equal and sample size is larger (100s -> 1000s).

Multivariate tests

The table of multivariate effects provides information about the extent to which specific IVs and combinations of IVs are associated with the pooled DVs.

This test is useful because it lets you know whether a specific IV has a systematic effect across a range, say, of subscales. If not, then it becomes likely that this IV has potentially opposite effects on related subscales, which usually does not make sense from the point of view of more detailed analysis (Assuming these subscales are not negatively correlated).

Levene's test of equality of error variances

Levene test examines extent to which standard deviation scores (variances) varies from cell to cell of the design for specific DVs. This test probably should be taken seriously because a statistically significant outcome suggests serious discrepancies in cell variance that could lead to unreliable statistical outcomes.

Should Levene test report significant outcomes then one options is to report univariate outcomes on the basis that the parametric test involved (analysis of variance (ANOVA)) is robust to violations of the relevant assumptions of normality. After all, not all variables are normally distributed. However, should there be a shadow of doubt not only about distributional properties but also about the measurement properties of the variable in question (i.e., Is the variable possess equal-interval properties versus is the variable based on ordinal or categorical data that and thus less than interval in its conception).

Scale scores typically are produced by adding item scores, where these item scores represent ordinal responses to Likert scale items. It seems unlikely that the sum of ordinal scores is greater than its nonparametric parts.

Under these conditions, after obtaining a statistically significant Levene test, it might be appropriate to use a nonparametric equivalent to ANOVA such as the Kruskal-Wallis test. This generates a chi-square statistic that evaluates the probability of obtaining a particular difference in mean rankings by chance.

Tests of between-subject effects

If it is OK to proceed with parametric testing, then the between-groups tests provide information about whether specific IVs or combinations of IVs are significantly associated with specific DVs. That is, this table reports the results of univariate tests using ANOVA.

Typically one reports statistically significant outcomes as follows:

F value, degrees of freedom (treatment), degrees of freedom (error), probability less than some cut-off point (usually .05, .01, .001).

In text the above might be reported as (F (df trt, df err)=value, p

An example might go: (F (1,140)=4.55,p<.05).

Estimated marginal means

The first mean to be reported is the grand mean. This, like the intercept, usually is ignored in relation to human data.

Estimates for specific IVs and specific DVs provide useful information that includes the mean, standard error (Standard deviation divided by square root of sample size), and lower versus upper bounds of 95% confidence interval.

The combination of mean and standard error provide enough information to make judgments about the likelihood of particular mean scores being significantly different.

For example, a graph containing these two bits of information summarises in visual format the signficance of the gap between means. If a standard error based on one mean does not encompass the other mean, then the difference is most likely significant.

The confidence interval (CI) provides equivalent information. That is, if the lower and upper bounds for a CI associated with one mean do not overlap the CI for another mean score, then the difference between these two mean scores is likely to be statistically significant.

Pairwise comparisons

This table reports the mean difference for a specific IV (e.g., male vs. female scores) related to a specific DV.

This table parallels information provided by the estimates table and adds to it an estimate of the statistical significance of the gap between these means. This table can be useful for IVs with multiple levels (e.g., Where participants are collapsed into multiple age groups such as young, young-old, old that correspond to specific cut-offs). One might discover that the scores on a specific DV are significantly different for young participants versus young-old or old participants but young-old versus old participant scores are not statistically distinct).

General comments

The output considered above is for between-groups MANOVA with two or more dependent variables (DVs) and one or more independent variables (IVs).

Other options included the repeated measures MANOVA and mixed model ANOVA (repeated measures plus between-groups effects)

It is tempting to use repeated measures MANOVA to analyse multivariate DVs consisting of two or more subscales. Doing so generates a multivariate effect for scale, which may or may not be significant. If it is, then what you have learnt is that the average score per scale differ significantly. The issue is whether this discrepancy really matters.

I would suggest that the only time repeated measures makes sense practically speaking is when the scores entered reflect temporal differences of interest. Such differences might include a specific measure collected, say, prior to an intervention versus afterwards, where one might expect the average score to increase or decrease significantly all going well.

Where multivariate/univariate testing identifies statistically significant two-way or three-way interactions, one way to examine these is to split dataset by one or more components of the interaction (you might need to experiment here with differing components).

The aim here is to hold one component constant (by splitting it per level) and then look for statistically significant main effect involving the other component.

The idea behind interactions being that one of the components has distinct effects at separate levels of the other component (thus the notion of an interaction). For example, Spending one or more years abroad influences international knowledge for women but not for men).

This kind of analysis can also be done by using SPSS syntax (not available via GUI interface) but it can be tricky.