This chapter describes methods for checking the homogeneity of variances test in R across two or more groups.
Some statistical tests, such as two independent samples T-test and ANOVA test, assume that variances are equal across groups.
There are different variance tests that can be used to assess the equality of variances. These include:
- F-test: Compare the variances of two groups. The data must be normally distributed.
- Bartlett’s test: Compare the variances of two or more groups. The data must be normally distributed.
- Levene’s test: A robust alternative to the Bartlett’s test that is less sensitive to departures from normality.
- Fligner-Killeen’s test: a non-parametric test which is very robust against departures from normality.
Note that, the Levene’s test is the most commonly used in the literature.
You will learn how to compare variances in R using each of the tests mentioned above.
Contents:
Related Book
Practical Statistics in R II - Comparing Groups: Numerical VariablesPrerequisites
Load the tidyverse
package for easy data manipulation
library(tidyverse)
Demo dataset: ToothGrowth
. Inspect the data by displaying some random rows.
# Data preparation
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
# Inspect
set.seed(123)
sample_n(ToothGrowth, 6)
## len supp dose
## 1 14.5 VC 1
## 2 25.8 OJ 1
## 3 25.5 VC 2
## 4 25.5 OJ 2
## 5 22.4 OJ 2
## 6 7.3 VC 0.5
F-test: Compare two variances
The F-test is used to assess whether the variances of two populations (A and B) are equal. You need to check whether the data is normally distributed (Chapter @ref(normality-test-in-r)) before using the F-test.
Applications. Comparing two variances is useful in several cases, including:
- When you want to perform a two samples t-test, you need to check the equality of the variances of the two samples
- When you want to compare the variability of a new measurement method to an old one. Does the new method reduce the variability of the measure?
The statistical hypotheses are:
- Null hypothesis (H0): the variances of the two groups are equal.
- Alternative hypothesis (Ha): the variances are different.
Computation. The F-test statistic can be obtained by computing the ratio of the two variances Var(A)/Var(B)
. The more this ratio deviates from 1, the stronger the evidence for unequal population variances.
The F-test can be easily computed in R using the function var.test()
. In the following R code, we want to test the equality of variances between the two groups OJ and VC (in the column “supp”) for the variable len
.
res <- var.test(len ~ supp, data = ToothGrowth)
res
##
## F test to compare two variances
##
## data: len by supp
## F = 0.6, num df = 30, denom df = 30, p-value = 0.2
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.304 1.342
## sample estimates:
## ratio of variances
## 0.639
Interpretation. The p-value is p = 0.2 which is greater than the significance level 0.05. In conclusion, there is no significant difference between the two variances.
Compare multiple variances
This section describes how to compare multiple variances in R using Bartlett, Levene or Fligner-Killeen tests.
Statistical hypotheses. For all these tests that follow, the null hypothesis is that all populations variances are equal, the alternative hypothesis is that at least two of them differ. Consequently, p-values less than 0.05 suggest variances are significantly different and the homogeneity of variance assumption has been violated.
Bartlett’s test
- Bartlett’s test with one independent variable:
res <- bartlett.test(weight ~ group, data = PlantGrowth)
res
##
## Bartlett test of homogeneity of variances
##
## data: weight by group
## Bartlett's K-squared = 3, df = 2, p-value = 0.2
From the output, it can be seen that the p-value of 0.237 is not less than the significance level of 0.05. This means that there is no evidence to suggest that the variance in plant growth is statistically significantly different for the three treatment groups.
- Bartlett’s test with multiple independent variables: the interaction() function must be used to collapse multiple factors into a single variable containing all combinations of the factors.
bartlett.test(len ~ interaction(supp,dose), data=ToothGrowth)
##
## Bartlett test of homogeneity of variances
##
## data: len by interaction(supp, dose)
## Bartlett's K-squared = 7, df = 5, p-value = 0.2
Levene’s test
The function leveneTest()
[in car package] can be used.
library(car)
# Levene's test with one independent variable
leveneTest(weight ~ group, data = PlantGrowth)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 1.12 0.34
## 27
# Levene's test with multiple independent variables
leveneTest(len ~ supp*dose, data = ToothGrowth)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 5 1.71 0.15
## 54
Fligner-Killeen’s test
The Fligner-Killeen’s test is one of the many tests for homogeneity of variances which is most robust against departures from normality.
The R function fligner.test()
can be used to compute the test:
fligner.test(weight ~ group, data = PlantGrowth)
##
## Fligner-Killeen test of homogeneity of variances
##
## data: weight by group
## Fligner-Killeen:med chi-squared = 2, df = 2, p-value = 0.3
Summary
This article presents different tests for assessing the equality of variances between groups, an assumption made by the two-independent samples t-test and ANOVA tests.
The commonly used method is the Levene’s test available in the car
package. A pipe-friendly wrapper levene_test()
is also provided in the rstatix
package.
Recommended for you
This section contains best data science and self-development resources to help you on your path.
Coursera - Online Courses and Specialization
Data science
- Course: Machine Learning: Master the Fundamentals by Stanford
- Specialization: Data Science by Johns Hopkins University
- Specialization: Python for Everybody by University of Michigan
- Courses: Build Skills for a Top Job in any Industry by Coursera
- Specialization: Master Machine Learning Fundamentals by University of Washington
- Specialization: Statistics with R by Duke University
- Specialization: Software Development in R by Johns Hopkins University
- Specialization: Genomic Data Science by Johns Hopkins University
Popular Courses Launched in 2020
- Google IT Automation with Python by Google
- AI for Medicine by deeplearning.ai
- Epidemiology in Public Health Practice by Johns Hopkins University
- AWS Fundamentals by Amazon Web Services
Trending Courses
- The Science of Well-Being by Yale University
- Google IT Support Professional by Google
- Python for Everybody by University of Michigan
- IBM Data Science Professional Certificate by IBM
- Business Foundations by University of Pennsylvania
- Introduction to Psychology by Yale University
- Excel Skills for Business by Macquarie University
- Psychological First Aid by Johns Hopkins University
- Graphic Design by Cal Arts
Amazon FBA
Amazing Selling Machine
Books - Data Science
Our Books
- Practical Guide to Cluster Analysis in R by A. Kassambara (Datanovia)
- Practical Guide To Principal Component Methods in R by A. Kassambara (Datanovia)
- Machine Learning Essentials: Practical Guide in R by A. Kassambara (Datanovia)
- R Graphics Essentials for Great Data Visualization by A. Kassambara (Datanovia)
- GGPlot2 Essentials for Great Data Visualization in R by A. Kassambara (Datanovia)
- Network Analysis and Visualization in R by A. Kassambara (Datanovia)
- Practical Statistics in R for Comparing Groups: Numerical Variables by A. Kassambara (Datanovia)
- Inter-Rater Reliability Essentials: Practical Guide in R by A. Kassambara (Datanovia)
Others
- R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham & Garrett Grolemund
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Géron
- Practical Statistics for Data Scientists: 50 Essential Concepts by Peter Bruce & Andrew Bruce
- Hands-On Programming with R: Write Your Own Functions And Simulations by Garrett Grolemund & Hadley Wickham
- An Introduction to Statistical Learning: with Applications in R by Gareth James et al.
- Deep Learning with R by François Chollet & J.J. Allaire
- Deep Learning with Python by François Chollet
Version: Français
pls what is the difference between “sample_n ” and “sample_n_by”. Thanks