Welch T-Test : Excellent Reference You will Love

Welch T-Test

5 mins

T-Test Essentials: Definition, Formula and Calculation

The independent samples t-test comes in two different forms:

the standard Student’s t-test, which assumes that the variance of the two groups are equal.
the Welch’s t-test, which is less restrictive compared to the original Student’s test. This is the test where you do not assume that the variance is the same in the two groups, which results in the fractional degrees of freedom.

Note that, the Welch t-test is considered as the safer one. Usually, the results of the classical student’s t-test and the Welch t-test are very similar unless both the group sizes and the standard deviations are very different.

This article describes the Welch t-test, which is an adaptation of the Student’s t-test for comparing the means of two independent groups, in the situation where the homogeneity of variance assumption is not met. The Welch t-test is also referred as: Welch’s t-test, Welchs t-test, t-test unequal variance, t-test assuming unequal variances or separate variance t-test

In this article, you will learn:

Welch t-test formula and assumptions
How to compute, interpret and report the Welch t-test in R.
How to check the Welch t-test assumptions

Contents:

Prerequisites
Research questions
Statistical hypotheses
Formula
Assumptions and preleminary tests
Calculating the test in R
Report
Summary

Related Book

Practical Statistics in R II - Comparing Groups: Numerical Variables

Prerequisites

Make sure you have installed the following R packages:

tidyverse for data manipulation and visualization
ggpubr for creating easily publication ready plots
rstatix provides pipe-friendly R functions for easy statistical analyses.
datarium: contains required data sets for this chapter.

Start by loading the following required packages:

library(tidyverse)
library(ggpubr)
library(rstatix)

Research questions

A typical research questions is: whether the mean of group A (\(m_A\)) is equal to the mean of group B (\(m_B\))?

Statistical hypotheses

Null hypothesis (Ho): the two group means are identical (\(m_A = m_B\))
Alternative hypothesis (Ha): the two group means are different (\(m_A \ne m_B\))

Formula

The Welch t-statistic is calculated as follow :

\[
t = \frac{m_A - m_B}{\sqrt{ \frac{S_A^2}{n_A} + \frac{S_B^2}{n_B} }}
\]

where, \(S_A\) and \(S_B\) are the standard deviation of the the two groups A and B, respectively.

Unlike the classic Student’s t-test, the Welch t-test formula involves the variance of each of the two groups (\(S_A^2\) and \(S_B^2\)) being compared. In other words, it does not use the pooled variance\(S\).

The degrees of freedom of Welch t-test is estimated as follow :

\[
df = (\frac{S_A^2}{n_A}+ \frac{S_B^2}{n_B})^2 / (\frac{S_A^4}{n_A^2(n_A-1)} + \frac{S_B^4}{n_B^2(n_B-1)} )
\]

A p-value can be computed for the corresponding absolute value of t-statistic (|t|).

If the p-value is inferior or equal to the significance level 0.05, we can reject the null hypothesis and accept the alternative hypothesis. In other words, we can conclude that the mean values of group A and B are significantly different.

Assumptions and preleminary tests

The Welch t-test assumes the following characteristics about the data:

Independence of the observations. Each subject should belong to only one group.
No significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.

Click to check the Student t-test assumptions.

Calculating the test in R

Demo data

Demo dataset: genderweight [in datarium package] containing the weight of 40 individuals (20 women and 20 men).

Load the data and show some random rows by groups:

# Load the data
data("genderweight", package = "datarium")
# Show a sample of the data by group
set.seed(123)
genderweight %>% sample_n_by(group, size = 2)

## # A tibble: 4 x 3
##   id    group weight
##   <fct> <fct>  <dbl>
## 1 6     F       65.0
## 2 15    F       65.9
## 3 29    M       88.9
## 4 37    M       77.0

Summary statistics

Compute some summary statistics by groups: mean and sd (standard deviation)

genderweight %>%
  group_by(group) %>%
  get_summary_stats(weight, type = "mean_sd")

## # A tibble: 2 x 5
##   group variable     n  mean    sd
##   <fct> <chr>    <dbl> <dbl> <dbl>
## 1 F     weight      20  63.5  2.03
## 2 M     weight      20  85.8  4.35

Visualization

Visualize the data using box plots. Plot weight by groups.

bxp <- ggboxplot(
  genderweight, x = "group", y = "weight", 
  ylab = "Weight", xlab = "Groups", add = "jitter"
  )
bxp

Computation

We’ll use the pipe-friendly t_test() function [rstatix package], a wrapper around the R base function t.test().

Recall that, by default, R computes the Welch t-test, which is the safer one. This is the test where you do not assume that the variance is the same in the two groups, which results in the fractional degrees of freedom. If you want to assume the equality of variances (Student t-test), specify the option var.equal = TRUE.

stat.test <- genderweight %>%
  t_test(weight ~ group) %>%
  add_significance()
stat.test

## # A tibble: 1 x 9
##   .y.    group1 group2    n1    n2 statistic    df        p p.signif
##   <chr>  <chr>  <chr>  <int> <int>     <dbl> <dbl>    <dbl> <chr>   
## 1 weight F      M         20    20     -20.8  26.9 4.30e-18 ****

The results above show the following components:

.y.: the y variable used in the test.
group1,group2: the compared groups in the pairwise tests.
statistic: Test statistic used to compute the p-value.
df: degrees of freedom.
p: p-value.

Note that, you can obtain a detailed result by specifying the option detailed = TRUE.

Cohen’s d for Welch t-test

The effect size can be computed by dividing the mean difference between the groups by the “averaged” standard deviation.

Cohen’s d formula:

d = (mean1 - mean2)/sqrt((var1 + var2)/2), where:

mean1 and mean2 are the means of each group, respectively
var1 and var2 are the variance of the two groups.

Calculation:

genderweight %>% cohens_d(weight ~ group, var.equal = FALSE)

## # A tibble: 1 x 7
##   .y.    group1 group2 effsize    n1    n2 magnitude
## * <chr>  <chr>  <chr>    <dbl> <int> <int> <ord>    
## 1 weight F      M        -6.57    20    20 large

Report

We could report the result as follow:

The mean weight in female group was 63.5 (SD = 2.03), whereas the mean in male group was 85.8 (SD = 4.3). A Welch two-samples t-test showed that the difference was statistically significant, t(26.9) = -20.8, p < 0.0001, d = 6.57; where, t(26.9) is shorthand notation for a Welch t-statistic that has 26.9 degrees of freedom.

stat.test <- stat.test %>% add_xy_position(x = "group")
bxp + 
  stat_pvalue_manual(stat.test, tip.length = 0) +
  labs(subtitle = get_test_label(stat.test, detailed = TRUE))

Summary

This article describes the formula and the basics of the Welch t-test. Examples of R codes are provided for computing the test and the effect size, interpreting and reporting the results.

Recommended for you

This section contains best data science and self-development resources to help you on your path.

Books - Data Science

Our Books

Practical Guide to Cluster Analysis in R by A. Kassambara (Datanovia)
Practical Guide To Principal Component Methods in R by A. Kassambara (Datanovia)
Machine Learning Essentials: Practical Guide in R by A. Kassambara (Datanovia)
R Graphics Essentials for Great Data Visualization by A. Kassambara (Datanovia)
GGPlot2 Essentials for Great Data Visualization in R by A. Kassambara (Datanovia)
Network Analysis and Visualization in R by A. Kassambara (Datanovia)
Practical Statistics in R for Comparing Groups: Numerical Variables by A. Kassambara (Datanovia)
Inter-Rater Reliability Essentials: Practical Guide in R by A. Kassambara (Datanovia)

Others

R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham & Garrett Grolemund
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Géron
Practical Statistics for Data Scientists: 50 Essential Concepts by Peter Bruce & Andrew Bruce
Hands-On Programming with R: Write Your Own Functions And Simulations by Garrett Grolemund & Hadley Wickham
An Introduction to Statistical Learning: with Applications in R by Gareth James et al.
Deep Learning with R by François Chollet & J.J. Allaire
Deep Learning with Python by François Chollet

Version: Français

Back to T-Test Essentials: Definition, Formula and Calculation

Comments ( 2 )

Johan Westerhuis

11 Oct 2021

Example:
s1 = 2, s2 = 2
n1 = 4, n2 = 6
In this case df = 6,57.
If sample 2 is measured more often (e.g. n=7) then df decreases again. Why is that?

Reply
Jim Robison

26 Nov 2022

What is the mathematical formula (not the software commands) used to convert the test statistic and the degrees of freedom to a p-value?

Reply