This chapter provides a quick start R code to compute the different statistical measures for analyzing the inter-rater reliability or agreement. These include:
- Cohen’s Kappa: It can be used for either two nominal or two ordinal variables. It accounts for strict agreements between observers. It is most appropriate for two nominal variables.
- Weighted Kappa: It should be considered for two ordinal variables only. It allows partial agreement.
- Light’s Kappa, which is the average of Cohen’s Kappa if using more than two categorical variables.
- Fleiss Kappa: for two or more categorical variables (nominal or ordinal)
- Intraclass correlation coefficient (ICC) for continuous or ordinal data
Contents:
Related Book
Inter-Rater Reliability Essentials: Practical Guide in RR packages
There are many R packages and functions for inter-rater agreement analyses, including:
Measures | R function [package] |
---|---|
Cohen’s kappa | Kappa() [vcd], kappa2() [irr] |
Weighted kappa | Kappa() [vcd], kappa2() [irr] |
Light’s kappa | kappam.light() [irr] |
Fleiss Kappa | kappam.fleiss() [irr] |
ICC | icc() [irr], ICC() [psych] |
Prerequisites
In the next sections, we’ll use only the functions from the irr
package. Make sure you have installed it.
Load the package:
# install.packages("irrr")
library(irr)
Examples data
- psychiatric diagnoses data provided by 6 raters [irr package]. A total of 30 patients were enrolled and classified by each of the raters into 5 nominal categories (Fleiss and others 1971): 1. Depression, 2. Personality Disorder, 3. Schizophrenia, 4. Neurosis, 5. Other.
- anxiety data [irr package], which contains the anxiety ratings of 20 subjects, rated by 3 raters on ordinal scales. Values are ranging from 1 (not anxious at all) to 6 (extremely anxious).
Inspect the data:
# Diagnoses data
data("diagnoses", package = "irr")
head(diagnoses[, 1:3])
## rater1 rater2 rater3
## 1 4. Neurosis 4. Neurosis 4. Neurosis
## 2 2. Personality Disorder 2. Personality Disorder 2. Personality Disorder
## 3 2. Personality Disorder 3. Schizophrenia 3. Schizophrenia
## 4 5. Other 5. Other 5. Other
## 5 2. Personality Disorder 2. Personality Disorder 2. Personality Disorder
## 6 1. Depression 1. Depression 3. Schizophrenia
# Anxiety data
data("anxiety", package = "irr")
head(anxiety, 4)
## rater1 rater2 rater3
## 1 3 3 2
## 2 3 6 1
## 3 3 4 4
## 4 4 6 4
Cohen’s Kappa: two raters
The Cohen’s kappa corresponds to the unweighted kappa. It can be used for two nominal or two ordinal categorical variables
kappa2(diagnoses[, c("rater1", "rater2")], weight = "unweighted")
## Cohen's Kappa for 2 Raters (Weights: unweighted)
##
## Subjects = 30
## Raters = 2
## Kappa = 0.651
##
## z = 7
## p-value = 2.63e-12
Weighed kappa: ordinal scales
Weighted kappa should be considered only when ratings are performed in ordinal scale as in the following example.
kappa2(anxiety[, c("rater1", "rater2")], weight = "equal")
Light’s kappa: multiple raters
It returns the average Cohen’s kappa when you have multiple raters
kappam.light(diagnoses[, 1:3])
## Light's Kappa for m Raters
##
## Subjects = 30
## Raters = 3
## Kappa = 0.555
##
## z = NaN
## p-value = NaN
Fleiss’ kappa: multiple raters
The raters are not assumed to be the same for all subjects.
kappam.fleiss(diagnoses[, 1:3])
## Fleiss' Kappa for m Raters
##
## Subjects = 30
## Raters = 3
## Kappa = 0.534
##
## z = 9.89
## p-value = 0
Intraclass correlation coefficients: continuous scales
Read more in Chapter @ref(intraclass-correlation-coefficient):
icc(
anxiety, model = "twoway",
type = "agreement", unit = "single"
)
## Single Score Intraclass Correlation
##
## Model: twoway
## Type : agreement
##
## Subjects = 20
## Raters = 3
## ICC(A,1) = 0.198
##
## F-Test, H0: r0 = 0 ; H1: r0 > 0
## F(19,39.7) = 1.83 , p = 0.0543
##
## 95%-Confidence Interval for ICC Population Values:
## -0.039 < ICC < 0.494
Summary
This article describes how to compute the different inter-rater agreement measures using the irr
packages.
References
Fleiss, J.L., and others. 1971. “Measuring Nominal Scale Agreement Among Many Raters.” Psychological Bulletin 76 (5): 378–82.
Recommended for you
This section contains best data science and self-development resources to help you on your path.
Coursera - Online Courses and Specialization
Data science
- Course: Machine Learning: Master the Fundamentals by Stanford
- Specialization: Data Science by Johns Hopkins University
- Specialization: Python for Everybody by University of Michigan
- Courses: Build Skills for a Top Job in any Industry by Coursera
- Specialization: Master Machine Learning Fundamentals by University of Washington
- Specialization: Statistics with R by Duke University
- Specialization: Software Development in R by Johns Hopkins University
- Specialization: Genomic Data Science by Johns Hopkins University
Popular Courses Launched in 2020
- Google IT Automation with Python by Google
- AI for Medicine by deeplearning.ai
- Epidemiology in Public Health Practice by Johns Hopkins University
- AWS Fundamentals by Amazon Web Services
Trending Courses
- The Science of Well-Being by Yale University
- Google IT Support Professional by Google
- Python for Everybody by University of Michigan
- IBM Data Science Professional Certificate by IBM
- Business Foundations by University of Pennsylvania
- Introduction to Psychology by Yale University
- Excel Skills for Business by Macquarie University
- Psychological First Aid by Johns Hopkins University
- Graphic Design by Cal Arts
Amazon FBA
Amazing Selling Machine
Books - Data Science
Our Books
- Practical Guide to Cluster Analysis in R by A. Kassambara (Datanovia)
- Practical Guide To Principal Component Methods in R by A. Kassambara (Datanovia)
- Machine Learning Essentials: Practical Guide in R by A. Kassambara (Datanovia)
- R Graphics Essentials for Great Data Visualization by A. Kassambara (Datanovia)
- GGPlot2 Essentials for Great Data Visualization in R by A. Kassambara (Datanovia)
- Network Analysis and Visualization in R by A. Kassambara (Datanovia)
- Practical Statistics in R for Comparing Groups: Numerical Variables by A. Kassambara (Datanovia)
- Inter-Rater Reliability Essentials: Practical Guide in R by A. Kassambara (Datanovia)
Others
- R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham & Garrett Grolemund
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Géron
- Practical Statistics for Data Scientists: 50 Essential Concepts by Peter Bruce & Andrew Bruce
- Hands-On Programming with R: Write Your Own Functions And Simulations by Garrett Grolemund & Hadley Wickham
- An Introduction to Statistical Learning: with Applications in R by Gareth James et al.
- Deep Learning with R by François Chollet & J.J. Allaire
- Deep Learning with Python by François Chollet
Version: Français
I ran into an issue while installing the package as install.packages(“irrr”) with the messege “Warning in install.packages: package ‘irrr’ is not available for this version of R” and was able to fix this by instead typing install.packages(“irr”) (So 2 r’s instead of 3 r’s).
Using R 4.1.2