Course description
This course describes how to compare multiple means in R using the ANOVA (Analysis of Variance) method and variants, including:
- ANOVA test for comparing independent measures.
- Repeated-measures ANOVA, which is used for analyzing data where same subjects are measured more than once.
- Mixed ANOVA, which is used to compare the means of groups cross-classified by at least two factors, where one factor is a “within-subjects” factor (repeated measures) and the other factor is a “between-subjects” factor.
- ANCOVA (analyse of covariance), an extension of the one-way ANOVA that incorporate a covariate variable.
- MANOVA (multivariate analysis of variance), an ANOVA with two or more continuous outcome variables.
We also provide R code to check ANOVA assumptions and perform Post-Hoc analyses. Additionally, we’ll present:
- Kruskal-Wallis test, which is a non-parametric alternative to the one-way ANOVA test.
- Friedman test, which is a non-parametric alternative to the one-way repeated measures ANOVA test.
Related Book
Practical Statistics in R II - Comparing Groups: Numerical VariablesR functions and packages
There are different functions/packages in R for computing ANOVA. These include:
aov()
[stats]: Computes type I sum of squares (SS). Should be only used when you have balanced designs (group sizes are equal).Anova()
[car]: Computes type-II and type-III sum of squares. Type-II will yield identical ANOVA results as type-I when the data are balanced. When data are unbalanced, type-III will emulate the approach taken by popular commercial statistics packages like SAS and SPSS, but this approach is not without criticism.ezANOVA()
[ez],car_aov()
[afex] andanova_test()
[rstatix]: Wrappers around the functionAnova()
[car] for facilitating the analysis of factorial experiments, including purely ‘within-Ss’ designs (repeated measures), purely ‘between-Ss’ designs, and mixed ‘within-and-between-Ss’ designs.
The advantage of anova_test()
[rstatix] is that it supports both model and formula as inputs. Variables can be also specified as character vector using the arguments dv
, wid
, between
, within
, covariate
. Read more in the documentation by typing ?anova_test
in R console. It provides a simple and intuitive pipe-friendly framework, coherent with the tidyverse
design philosophy. Additionally, it supports grouped data as returned by the function dplyr::group_by()
. The results include ANOVA table, generalized effect size and some assumption checks.
In this guide, we’ll use mainly the function anova_test()
.
Recommendations
- The outcome variable, also known as dependent variable (dv), should be numeric
- The grouping variables, also known as predictors or independent variables, should be factors. If you want to compute ANCOVA models, you can also add numeric predictors.
- Do not use the R base functions aov() and anova() to get ANOVA tables unless you know what you are doing. They compute the type-I sum of squares, which is not, for example, suitable for unbalanced designs. The results, obtained with the default options of theses functions, are different from those obtained with commercial stats softwares, including SPSS and SAS, and most other stats packages. These differences are important and will be confusing and give you misleading results unless you understand them.
Follow the recommendations below:
- If you have a factorial design with independent measures, you can define your model using
lm()
and then userstatix::anova_test()
orcar::Anova()
to calculate F tests. - If you have perfect balanced repeated measures design with no missing values, then use
rstatix::anova_test()
. - If you have an unbalanced repeated measures design, or you repeated measures with missing data, use linear mixed models instead via the
lme4::lmer()
.
Version: Français
Having issues with trying to use ggplot to display emmeans pairwise results. I am wanting to see the contrast between all years not just within the same year.
r script:
library(emmeans) ### estimated marginal means and p values
library(sjstats) ### partial eta squared and cohens f effect size
library(lme4) ### estimated the multi level model (random intercept for participants)
library(lmerTest) ### gives more comprehsive anova output with p values
library(MuMIn) ### R2 for the model
library(tidyverse)
library(ggpubr)
library(rstatix)
library(dplyr)
library(readxl)
library(scales)
library(ggsignif)
library(sjPlot)
#load data for One way linear mixed models, repeated measures ##########################
glm_test <- read_excel("C:/Users/maber/Documents/Clover Valley/CLOVER VALLEY/veg data/LPI/UPL_cover.xlsx")
view(glm_test)
head(glm_test, 3)
#factor data#############tell it you have factors and what they are called##############
glm_test$year <- factor(glm_test$year,
levels = c(1, 2, 3),
labels = c("2018", "2019", "2020"))
glm_test$transect <- factor(glm_test$transect,
levels = c(1, 2,3),
labels = c("Lower Meadow", "Middle Meadow", "Upper Meadow"))
###R knows your factors, the below command brakes down your variables
summary(glm_test)
#run linear model with cover as dependent varible, year as #############################
#independent var. for the plot by stating + (1| plot), This makes it a repeated measures
#we call our linear model lmer1
#set up the model use the following code
lmer2 <- lmer(cover ~ as.factor(year) * as.factor(transect) + (1|plot),
data = glm_test)
#linear model dep variable predicted by the indep variable (year)#####################
#run the model with the following code
anova(lmer2)
summary(aov(cover ~ as.factor(year) * as.factor(transect) + Error (plot),
data = glm_test))
#post Hoc comparisons###################################################
#emmeans(lmer2, list(pairwise ~ year), adjust = "bonferroni")
emmeans(lmer2, list(pairwise ~ year), adjust = "tukey")
# if there was an interaction your code would be this###################
emmeans(lmer2, pairwise ~ year | transect)
tab_model(lmer2)
#Visulize data #########################################################
ggboxplot(glm_test, x = "transect", y = "cover", title = "Total % Cover of Upland Species",
ylab = " % Cover UPL Species",
color = "year", ggtheme = theme_gray(base_size = 14))+
scale_x_discrete(labels = label_wrap_gen(7)) +
stat_compare_means(comparisons = my_comparisons, label.y = c(65, 75, 80))+
stat_compare_means(label.y = 82)
The results I want to plot are:
transect = Upper Meadow:
contrast estimate SE df t.ratio p.value
2018 – 2019 27.500 7.15 38 3.848 0.0013
2018 – 2020 20.000 7.15 38 2.798 0.0214
2019 – 2020 -7.500 7.15 38 -1.049 0.5509
Link to HTML doc. file:///C:/Users/maber/Documents/Clover%20Valley/CLOVER%20VALLEY/veg%20data/one%20way%20linear%20mixed%20model,%20repeated%20Meas/two-way_liner_mixed_model_repeated_measures.html
data:
plot transect year cover
D11_NORTH 1 1 24
D11_SOUTH 1 1 16
R27_NORTH 1 1 18
R27_SOUTH 1 1 8
R28_NORTH 1 1 64
R28_SOUTH 1 1 54
D11_NORTH 1 2 46
D11_SOUTH 1 2 38
R27_NORTH 1 2 2
R27_SOUTH 1 2 0
R28_NORTH 1 2 20
R28_SOUTH 1 2 34
D11_NORTH 1 3 44
D11_SOUTH 1 3 36
R27_NORTH 1 3 0
R27_SOUTH 1 3 0
R28_NORTH 1 3 26
R28_SOUTH 1 3 38
D13_NORTH 2 1 44
D13_SOUTH 2 1 30
D14_NORTH 2 1 0
D14_SOUTH 2 1 6
R25_NORTH 2 1 14
R25_SOUTH 2 1 2
R26_NORTH 2 1 20
R26_SOUTH 2 1 42
D13_NORTH 2 2 6
D13_SOUTH 2 2 0
D14_NORTH 2 2 2
D14_SOUTH 2 2 0
R25_NORTH 2 2 0
R25_SOUTH 2 2 14
R26_NORTH 2 2 0
R26_SOUTH 2 2 0
D13_NORTH 2 3 2
D13_SOUTH 2 3 8
D14_NORTH 2 3 0
D14_SOUTH 2 3 10
R25_NORTH 2 3 16
R25_SOUTH 2 3 0
R26_NORTH 2 3 18
R26_SOUTH 2 3 0
R21_NORTH 3 1 32
R21_SOUTH 3 1 82
R22_NORTH 3 1 52
R22_SOUTH 3 1 46
R23_NORTH 3 1 64
R23_SOUTH 3 1 68
R24_NORTH 3 1 8
R24_SOUTH 3 1 8
R21_NORTH 3 2 10
R21_SOUTH 3 2 16
R22_NORTH 3 2 10
R22_SOUTH 3 2 22
R23_NORTH 3 2 48
R23_SOUTH 3 2 28
R24_NORTH 3 2 4
R24_SOUTH 3 2 2
R21_NORTH 3 3 6
R21_SOUTH 3 3 10
R22_NORTH 3 3 16
R22_SOUTH 3 3 14
R23_NORTH 3 3 54
R23_SOUTH 3 3 82
R24_NORTH 3 3 16
R24_SOUTH 3 3 2