Repeated measures ANOVA make the assumption that the variances of differences between all combinations of related conditions (or group levels) are equal. This is known as the assumption of sphericity.
Sphericity is evaluated only for variables with more than two levels because sphericity necessarily holds for conditions with only two levels.
The violation of sphericity assumption may distort the variance calculations resulting to a more liberal repeated measures ANOVA test (i.e., an increase in the Type I error rate). In this case, the repeated-measures ANOVA must be appropriately corrected depending on the degree to which sphericity has been violated. Two common corrections are used in the literature: Greenhouse-Geisser epsilon (GGe), and Huynh-Feldt epsilon (HFe).
The Mauchly’s test of sphericity is used to assess whether or not the assumption of sphericity is met. This is automatically reported when using the R function anova_test()
[rstatix package]. Although this test has been heavily criticized, often failing to detect departures from sphericity in small samples and over-detecting them in large samples, it is nonetheless a commonly used test.
In this article, you will learn how to:
- Calculate sphericity
- Compute Mauchly’s test of sphericity in R
- Interpret repeated measures ANOVA results when the assumption of sphericity is met or violated
- Extract the ANOVA table automatically corrected for deviation from sphericity.
Contents:
Related Book
Practical Statistics in R II - Comparing Groups: Numerical VariablesPrerequisites
Make sure that you have installed the following R packages:
tidyverse
for data manipulation and visualizationggpubr
for creating easily publication ready plotsrstatix
provides pipe-friendly R functions for easy statistical analysesdatarium
: contains required data sets for this chapter
Start by loading the following required R packages:
library(tidyverse)
library(ggpubr)
library(rstatix)
Demo data
We’ll use the self-esteem score dataset measured over three time points. The data is available in the datarium package.
data("selfesteem", package = "datarium")
head(selfesteem, 3)
## # A tibble: 3 x 4
## id t1 t2 t3
## <int> <dbl> <dbl> <dbl>
## 1 1 4.01 5.18 7.11
## 2 2 2.56 6.91 6.31
## 3 3 3.24 4.44 9.78
Measuring sphericity
The procedure is as follow:
- Calculate the differences between each combination of related groups
- Compute the variance of each group difference
R codes:
# 1. Compute group differences
grp.diff <- selfesteem %>%
transmute(
`t1-t2` = t1 - t2,
`t1-t3` = t1 - t3,
`t2-t3` = t2 - t3
)
head(grp.diff, 3)
## # A tibble: 3 x 3
## `t1-t2` `t1-t3` `t2-t3`
## <dbl> <dbl> <dbl>
## 1 -1.18 -3.10 -1.93
## 2 -4.35 -3.75 0.604
## 3 -1.20 -6.53 -5.33
# 2. Compute the variances
grp.diff %>% map(var)
## $`t1-t2`
## [1] 1.3
##
## $`t1-t3`
## [1] 1.16
##
## $`t2-t3`
## [1] 3.08
From the results above, the variance of “t2-t3” appear to be much greater than the variances of “t1-t2” and “t1-t3”, suggesting that the data may violate the assumption of sphericity.
To determine whether statistically significant differences exist between the variances of the differences, the formal Mauchly’s test of sphericity can be computed.
Computing ANOVA and Mauchly’s test
The Mauchly’s test of sphericity is automatically reported by the function anova_test()
[rstatix package], a wrapper around car::Anova()
for making easy the computation of repeated measures ANOVA.
Key arguments:
data
: data framedv
: (numeric) the dependent (or outcome) variable name.wid
: variable name specifying the case/sample identifier.within
: within-subjects factor or grouping variable
Data preparation: Gather columns t1
, t2
and t3
into long format. Convert id
and time
variables into factor (or grouping) variables.
selfesteem <- selfesteem %>%
gather(key = "time", value = "score", t1, t2, t3) %>%
convert_as_factor(id, time)
head(selfesteem, 3)
## # A tibble: 3 x 3
## id time score
## <fct> <fct> <dbl>
## 1 1 t1 4.01
## 2 2 t1 2.56
## 3 3 t1 3.24
Run ANOVA test:
res <- anova_test(data = selfesteem, dv = score, wid = id, within = time)
res
## ANOVA Table (type III tests)
##
## $ANOVA
## Effect DFn DFd F p p<.05 ges
## 1 time 2 18 55.5 2.01e-08 * 0.829
##
## $`Mauchly's Test for Sphericity`
## Effect W p p<.05
## 1 time 0.551 0.092
##
## $`Sphericity Corrections`
## Effect GGe DF[GG] p[GG] p[GG]<.05 HFe DF[HF] p[HF] p[HF]<.05
## 1 time 0.69 1.38, 12.42 2.16e-06 * 0.774 1.55, 13.94 6.03e-07 *
The output is a list including three tables:
- ANOVA results showing the p-value and the effect size on the column labeled with
ges
(generalized eta squared); The effect size is essentially the amount of variability due to the within-subjects factor ignoring the effect of the subjects. - Mauchly’s Test of Sphericity. Only reported for variables or effects with >2 levels because sphericity necessarily holds for effects with only 2 levels. The null hypothesis is that the variances of the group differences are equal. Thus, a significant p-value (p <= 0.05) indicates that the variances of group differences are not equal.
- Sphericity corrections results to be considered in case we could not maintain the sphericity assumption. Two common corrections used in the literature are provided: Greenhouse-Geisser epsilon (GGe), and Huynh-Feldt epsilon (HFe) and their corresponding p-values.
Interpreting ANOVA results
When sphericity assumption is met
In our example, the Mauchly’s test of sphericity is not significant (p > 0.05); this indicates that, the variances of the differences between the levels of the within-subjects factor are equal. So, we can assume the sphericity of the covariance matrix and interpret the standard output available in the ANOVA table.
# Display ANOVA table
res$ANOVA
## Effect DFn DFd F p p<.05 ges
## 1 time 2 18 55.5 2.01e-08 * 0.829
The self-esteem score was statistically significantly different at the different time points during the diet, F(2, 18) = 55, p < 0.0001, eta2[g] = 0.83.
where,
F
Indicates that we are comparing to an F-distribution (F-test);(2, 18)
indicates the degrees of freedom for time and Error(time), respectively;81.8
indicates the obtained F-statistic valuep
specifies the p-valueges
(generalized eta squared, eta2[g]) is the effect size (amount of variability due to the within-subjects factor)
When sphericity assumption is violated
If your data has violated the assumption of sphericity (i.e., Mauchly’s test, p <= 0.05), you should interpret the results from the sphericity corrections
table, where there have been adjustments to the degrees of freedom, which has an impact on the statistical significance (i.e., p-value) of the test. The correction is applied by multiplying DFn
and DFd
by the correction estimate (Greenhouse-Geisser (GG) and Huynh-Feldt (HF) epsilon values).
Note that, the epsilon provides a measure of the degree to which sphericity has been violated. A value of 1 indicates no departure from sphericity (all variances of group differences are equal). A violation of sphericity results in an epsilon value below 1. The further epsilon is from 1, the worse the violation.
Greenhouse-Geisser and Huynh-Feldt corrections are given below:
res$`Sphericity Corrections`
## Effect GGe DF[GG] p[GG] p[GG]<.05 HFe DF[HF] p[HF] p[HF]<.05
## 1 time 0.69 1.38, 12.42 2.16e-06 * 0.774 1.55, 13.94 6.03e-07 *
It can be seen that, the mean self-esteem score remains statistically significantly different at the different time points, even after the sphericity corrections (p[GG] < 0.001 and p[HF] < 0.001).
Choosing sphericity corrections methods
Of the two sphericity correction methods, Huynh-Feldt correction is considered the least conservative (overestimate epsilon), while Greenhouse–Geisser is considered more conservative (underestimate epsilon when epsilon is close to 1).
The general recommendation is to use the Greenhouse-Geisser correction, particularly when epsilon < 0.75. In the situation where epsilon is greater than 0.75, some statisticians recommend to use the Huynh-Feldt correction (Girden 1992).
ANOVA table
The R function get_anova_table()
[rstatix package] can be used to easily extract and interpret the ANOVA table from the output of anova_test()
. It returns ANOVA table that has been automatically corrected for eventual deviation from the sphericity assumption in a design containing repeated measures factors.
For repeated measures ANOVA, the default of the function get_anova_table()
is to apply automatically the Greenhouse-Geisser sphericity correction to only factors violating the sphericity assumption (i.e., Mauchly’s test p-value is significant, p <= 0.05).
Usage:
get_anova_table(x, correction = c("auto", "GG", "HF", "none"))
x
: an object of class anova_test.correction
: used only in repeated measures ANOVA test to specify which correction of the degrees of freedom should be reported for the within-subject factors. Possible values are:- “GG”: applies Greenhouse-Geisser correction to all within-subjects factors even if the assumption of sphericity is met (i.e., Mauchly’s test is not significant, p > 0.05).
- “HF”: applies Hyunh-Feldt correction to all within-subjects factors even if the assumption of sphericity is met,
- “none”: returns the standard ANOVA table without any correction and
- “auto”: apply automatically GG correction to only within-subjects factors violating the sphericity assumption (i.e., Mauchly’s test p-value is significant, p <= 0.05).
Examples:
In our example, sphericity can be assumed according to the Mauchly’s test; so the standard ANOVA table is not modified with the option correction = "auto"
. Specifying the option correction = "GG"
will apply the correction even if the assumption is met.
# correction = "auto"
get_anova_table(res)
## ANOVA Table (type III tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 time 2 18 55.5 2.01e-08 * 0.829
# correction = "GG"
get_anova_table(res, correction = "GG")
## ANOVA Table (type III tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 time 1.38 12.4 55.5 2.16e-06 * 0.829
Summary
This article describes the basics of sphericity assumption. R codes are provided to compute repeated measures ANOVA and the Mauchly’s test of sphericity using the function anova_test()
[rstatix package]. We also show to interpret ANOVA results when sphericity assumption is met or not. Finally, we introduce the R function get_anova_table()
[rstatix] to easily extract and interpret the ANOVA table that is automatically corrected for eventual deviation from sphericity assumption.
References
Girden, E. 1992. “ANOVA: Repeated Measures.” Newbury Park, CA: Sage.
Recommended for you
This section contains best data science and self-development resources to help you on your path.
Coursera - Online Courses and Specialization
Data science
- Course: Machine Learning: Master the Fundamentals by Stanford
- Specialization: Data Science by Johns Hopkins University
- Specialization: Python for Everybody by University of Michigan
- Courses: Build Skills for a Top Job in any Industry by Coursera
- Specialization: Master Machine Learning Fundamentals by University of Washington
- Specialization: Statistics with R by Duke University
- Specialization: Software Development in R by Johns Hopkins University
- Specialization: Genomic Data Science by Johns Hopkins University
Popular Courses Launched in 2020
- Google IT Automation with Python by Google
- AI for Medicine by deeplearning.ai
- Epidemiology in Public Health Practice by Johns Hopkins University
- AWS Fundamentals by Amazon Web Services
Trending Courses
- The Science of Well-Being by Yale University
- Google IT Support Professional by Google
- Python for Everybody by University of Michigan
- IBM Data Science Professional Certificate by IBM
- Business Foundations by University of Pennsylvania
- Introduction to Psychology by Yale University
- Excel Skills for Business by Macquarie University
- Psychological First Aid by Johns Hopkins University
- Graphic Design by Cal Arts
Amazon FBA
Amazing Selling Machine
Books - Data Science
Our Books
- Practical Guide to Cluster Analysis in R by A. Kassambara (Datanovia)
- Practical Guide To Principal Component Methods in R by A. Kassambara (Datanovia)
- Machine Learning Essentials: Practical Guide in R by A. Kassambara (Datanovia)
- R Graphics Essentials for Great Data Visualization by A. Kassambara (Datanovia)
- GGPlot2 Essentials for Great Data Visualization in R by A. Kassambara (Datanovia)
- Network Analysis and Visualization in R by A. Kassambara (Datanovia)
- Practical Statistics in R for Comparing Groups: Numerical Variables by A. Kassambara (Datanovia)
- Inter-Rater Reliability Essentials: Practical Guide in R by A. Kassambara (Datanovia)
Others
- R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham & Garrett Grolemund
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Géron
- Practical Statistics for Data Scientists: 50 Essential Concepts by Peter Bruce & Andrew Bruce
- Hands-On Programming with R: Write Your Own Functions And Simulations by Garrett Grolemund & Hadley Wickham
- An Introduction to Statistical Learning: with Applications in R by Gareth James et al.
- Deep Learning with R by François Chollet & J.J. Allaire
- Deep Learning with Python by François Chollet
Version: Français
I would replace the two occurences of “eventual” with “possible”, since we’re not talking about something happening at the end, but about a possibility.