GGPlot Examples Best Reference

13 Jan

GGPlot Examples Best Reference

This article provides a gallery of ggplot examples, including: scatter plot, density plots and histograms, bar and line plots, error bars, box plots, violin plots and more.

Contents:

Prerequisites
Scatter plot
Distribution
Bar charts and alternatives
Line plot
Error bars
Box plots and alternatives
Time series data visualization
scatter plot matrix
Correlation analysis
Cluster analysis
Balloon plot

Related Book

GGPlot2 Essentials for Great Data Visualization in R

Prerequisites

Load required packages and set the theme function theme_bw() as the default theme:

library(tidyverse)
library(ggpubr) 
theme_set(
  theme_bw() + 
  theme(legend.position = "top")
  )

Scatter plot

Basic scatter plot with correlation coefficient. The function stat_cor() [ggpubr R package] is used to add the correlation coefficient.

library("ggpubr")
p <- ggplot(mtcars, aes(mpg, wt)) +
  geom_point() +
  geom_smooth(method = lm) +
  stat_cor(method = "pearson", label.x = 20)
p

Contextual zoom. Key R function facet_zoom() [ggforce]

library(ggforce)
ggplot(iris, aes(Petal.Length, Petal.Width, colour = Species)) +
    geom_point() +
    facet_zoom(x = Species == "versicolor")

Encircle some points. The function geom_encircle() [ggalt R package] can be used to encircle a certain group of points

# Encircle setosa group
library("ggalt")
circle.df <- iris %>% filter(Species == "setosa")
ggplot(iris, aes(Petal.Length, Petal.Width)) +
    geom_point(aes(colour = Species)) + 
  geom_encircle(data = circle.df, linetype = 2)

Create jittered points to avoid overlap. The overlapping points are randomly jittered around their original position based on a threshold controlled by the width argument in the function `geom_jitter()

# Basic scatter plot
ggplot(mpg, aes(cty, hwy)) +
  geom_point(size = 0.5)

# Jittered points
ggplot(mpg, aes(cty, hwy)) +
  geom_jitter(size = 0.5, width = 0.5)

Create count charts to avoid overlap. Wherever there is more points overlap, the size of the circle gets bigger.

ggplot(mpg, aes(cty, hwy)) +
  geom_count()

Bubble chart. In a bubble chart, points size is controlled by a continuous variable, here qsec.

ggplot(mtcars, aes(mpg, wt)) +
  geom_point(aes(size = qsec), alpha = 0.5) +
  scale_size(range = c(0.5, 12))  # Adjust the range of points size

Marginal density plots

library(ggpubr)
# Grouped Scatter plot with marginal density plots
ggscatterhist(
  iris, x = "Sepal.Length", y = "Sepal.Width",
  color = "Species", size = 3, alpha = 0.6,
  palette = c("#00AFBB", "#E7B800", "#FC4E07"),
  margin.params = list(fill = "Species", color = "black", size = 0.2)
  )

# Use box plot as marginal plots
ggscatterhist(
  iris, x = "Sepal.Length", y = "Sepal.Width",
  color = "Species", size = 3, alpha = 0.6,
  palette = c("#00AFBB", "#E7B800", "#FC4E07"),
  margin.plot = "boxplot",
  ggtheme = theme_bw()
  )

Distribution

Density plot

Basic density plot:

# Basic density plot
ggplot(iris, aes(Sepal.Length)) +
  geom_density()

# Add mean line
ggplot(iris, aes(Sepal.Length)) +
  geom_density(fill = "lightgray") +
  geom_vline(aes(xintercept = mean(Sepal.Length)), linetype = 2)

Change color by groups

# Change line color by groups
ggplot(iris, aes(Sepal.Length, color = Species)) +
  geom_density() +
  scale_color_viridis_d()

# Add mean line by groups
mu <- iris %>%
  group_by(Species) %>%
  summarise(grp.mean = mean(Sepal.Length))

ggplot(iris, aes(Sepal.Length, color = Species)) +
  geom_density() +
  geom_vline(aes(xintercept = grp.mean, color = Species),
             data = mu, linetype = 2) +
  scale_color_viridis_d()

Histogram

Basic histograms

# Basic histogram with mean line
ggplot(iris, aes(Sepal.Length)) +
  geom_histogram(bins = 20, fill = "white", color = "black")  +
  geom_vline(aes(xintercept = mean(Sepal.Length)), linetype = 2)

# Add density curves
ggplot(iris, aes(Sepal.Length, stat(density))) +
  geom_histogram(bins = 20, fill = "white", color = "black")  +
  geom_density() +
  geom_vline(aes(xintercept = mean(Sepal.Length)), linetype = 2)

Change color by groups

ggplot(iris, aes(Sepal.Length)) +
  geom_histogram(aes(fill = Species, color = Species), bins = 20, 
                 position = "identity", alpha = 0.5) +
  scale_fill_viridis_d() +
  scale_color_viridis_d()

QQ Plot

library(ggpubr)
ggqqplot(iris, x = "Sepal.Length",
   ggtheme = theme_bw())

Empirical cumulative distribution (ECDF)

ggplot(iris, aes(Sepal.Length)) +
  stat_ecdf(aes(color = Species)) +
  scale_color_viridis_d()

Density ridgeline plots

The density ridgeline plot is an alternative to the standard geom_density() function that can be useful for visualizing changes in distributions, of a continuous variable, over time or space. Ridgeline plots are partially overlapping line plots that create the impression of a mountain range.

library(ggridges)
ggplot(iris, aes(x = Sepal.Length, y = Species)) +
  geom_density_ridges(aes(fill = Species)) +
  scale_fill_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))

Bar charts and alternatives

Data

df <- mtcars %>%
  rownames_to_column() %>%
  as_data_frame() %>%
  mutate(cyl = as.factor(cyl)) %>%
  select(rowname, wt, mpg, cyl)
df

## # A tibble: 32 x 4
##   rowname              wt   mpg cyl  
##   <chr>             <dbl> <dbl> <fct>
## 1 Mazda RX4          2.62  21   6    
## 2 Mazda RX4 Wag      2.88  21   6    
## 3 Datsun 710         2.32  22.8 4    
## 4 Hornet 4 Drive     3.22  21.4 6    
## 5 Hornet Sportabout  3.44  18.7 8    
## 6 Valiant            3.46  18.1 6    
## # ... with 26 more rows

Basic bar plots

# Basic bar plots
ggplot(df, aes(x = rowname, y = mpg)) +
  geom_col() +
  rotate_x_text(angle = 45)

# Reorder row names by mpg values
ggplot(df, aes(x = reorder(rowname, mpg), y = mpg)) +
  geom_col()  +
  rotate_x_text(angle = 45)

Horizontal bar plots

# Horizontal bar plots, 
# change fill color by groups and add text labels
ggplot(df, aes(x = reorder(rowname, mpg), y = mpg)) +
  geom_col( aes(fill = cyl)) + 
  geom_text(aes(label = mpg), nudge_y = 2) + 
  coord_flip() +
  scale_fill_viridis_d()

Order bars by groups and by mpg values

df2 <- df %>% 
  arrange(cyl, mpg) %>%
  mutate(rowname = factor(rowname, levels = rowname))

ggplot(df2, aes(x = rowname, y = mpg)) +
  geom_col( aes(fill = cyl)) + 
  scale_fill_viridis_d() +
  rotate_x_text(45)

Lollipop chart: Lollipop is an alternative to bar charts when you have large data sets.

ggplot(df2, aes(x = rowname, y = mpg)) +
  geom_segment(
    aes(x = rowname, xend = rowname, y = 0, yend = mpg), 
    color = "lightgray"
    ) + 
  geom_point(aes(color = cyl), size = 3) +
  scale_color_viridis_d() +
  theme_pubclean() +
  rotate_x_text(45)

Bar plot with multiple groups

# Data
df3 <- data.frame(supp=rep(c("VC", "OJ"), each=3),
                dose=rep(c("D0.5", "D1", "D2"),2),
                len=c(6.8, 15, 33, 4.2, 10, 29.5))


# Stacked bar plots of y = counts by x = cut,
# colored by the variable color
ggplot(df3, aes(x = dose, y = len)) +
  geom_col(aes(color = supp, fill = supp), position = position_stack()) +
  scale_color_manual(values = c("#0073C2FF", "#EFC000FF"))+
  scale_fill_manual(values = c("#0073C2FF", "#EFC000FF"))

# Use position = position_dodge() 
ggplot(df3, aes(x = dose, y = len)) +
  geom_col(aes(color = supp, fill = supp), position = position_dodge(0.8), width = 0.7) +
  scale_color_manual(values = c("#0073C2FF", "#EFC000FF"))+
  scale_fill_manual(values = c("#0073C2FF", "#EFC000FF"))

Line plot

# Data
df3 <- data.frame(supp=rep(c("VC", "OJ"), each=3),
                dose=rep(c("D0.5", "D1", "D2"),2),
                len=c(6.8, 15, 33, 4.2, 10, 29.5))

# Line plot
ggplot(df3, aes(x = dose, y = len, group = supp)) +
  geom_line(aes(linetype = supp)) +
  geom_point(aes(shape = supp))

Error bars

Data

# Raw data
df <- ToothGrowth %>% mutate(dose = as.factor(dose))
head(df, 3)

##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5

# Summary statistics
df.summary <- df %>%
  group_by(dose) %>%
  summarise(sd = sd(len, na.rm = TRUE), len = mean(len))
df.summary

## # A tibble: 3 x 3
##   dose     sd   len
##   <fct> <dbl> <dbl>
## 1 0.5    4.50  10.6
## 2 1      4.42  19.7
## 3 2      3.77  26.1

Basic line and bar plots with error bars

# (1) Line plot
ggplot(df.summary, aes(dose, len)) +
  geom_line(aes(group = 1)) +
  geom_errorbar( aes(ymin = len-sd, ymax = len+sd),width = 0.2) +
  geom_point(size = 2)

# (2) Bar plot
ggplot(df.summary, aes(dose, len)) +
  geom_bar(stat = "identity", fill = "lightgray", color = "black") +
  geom_errorbar(aes(ymin = len, ymax = len+sd), width = 0.2)

Grouped line/bar plots

# Data preparation
df.summary2 <- df %>%
  group_by(dose, supp) %>%
  summarise( sd = sd(len), len = mean(len))
df.summary2

## # A tibble: 6 x 4
## # Groups:   dose [?]
##   dose  supp     sd   len
##   <fct> <fct> <dbl> <dbl>
## 1 0.5   OJ     4.46 13.2 
## 2 0.5   VC     2.75  7.98
## 3 1     OJ     3.91 22.7 
## 4 1     VC     2.52 16.8 
## 5 2     OJ     2.66 26.1 
## 6 2     VC     4.80 26.1

# (1) Line plot + error bars
ggplot(df.summary2, aes(dose, len)) +
  geom_line(aes(linetype = supp, group = supp))+
  geom_point()+
  geom_errorbar(
    aes(ymin = len-sd, ymax = len+sd, group = supp),
     width = 0.2
    )

# (2) Bar plots + upper error bars.
ggplot(df.summary2, aes(dose, len)) +
  geom_bar(aes(fill = supp), stat = "identity",
           position = position_dodge(0.8), width = 0.7)+
  geom_errorbar(
    aes(ymin = len, ymax = len+sd, group = supp),
    width = 0.2, position = position_dodge(0.8)
    )+
  scale_fill_manual(values = c("grey80", "grey30"))

Box plots and alternatives

Data

ToothGrowth$dose <- as.factor(ToothGrowth$dose)

Basic box plots

# Basic
ggplot(ToothGrowth, aes(dose, len)) +
  geom_boxplot()

# Box plot + violin plot
ggplot(ToothGrowth, aes(dose, len)) +
  geom_violin(trim = FALSE) +
  geom_boxplot(width = 0.2)

Add jittered points and dot plot

# Add jittered points
ggplot(ToothGrowth, aes(dose, len)) +
  geom_boxplot() +
  geom_jitter(width = 0.2)

# Dot plot + box plot
ggplot(ToothGrowth, aes(dose, len)) +
  geom_boxplot() +
  geom_dotplot(binaxis = "y", stackdir = "center")

Grouped plots

# Box plots
ggplot(ToothGrowth, aes(dose, len)) +
  geom_boxplot(aes(color = supp)) +
  scale_color_viridis_d()

# Add jittered points
ggplot(ToothGrowth, aes(dose, len, color = supp)) +
  geom_boxplot() +
  geom_jitter(position = position_jitterdodge(jitter.width = 0.2)) +
  scale_color_viridis_d()

Time series data visualization

# Data preparation
df <- economics %>%
  select(date, psavert, uempmed) %>%
  gather(key = "variable", value = "value", -date)
head(df, 3)

## # A tibble: 3 x 3
##   date       variable value
##   <date>     <chr>    <dbl>
## 1 1967-07-01 psavert   12.5
## 2 1967-08-01 psavert   12.5
## 3 1967-09-01 psavert   11.7

# Multiple line plot
ggplot(df, aes(x = date, y = value)) + 
  geom_line(aes(color = variable), size = 1) +
  scale_color_manual(values = c("#00AFBB", "#E7B800")) +
  theme_minimal()

scatter plot matrix

library(GGally)
ggpairs(iris[,-5])+ theme_bw()

Correlation analysis

library("ggcorrplot")
# Compute a correlation matrix
my_data <- mtcars[, c(1,3,4,5,6,7)]
corr <- round(cor(my_data), 1)
# Visualize
ggcorrplot(corr, p.mat = cor_pmat(my_data),
           hc.order = TRUE, type = "lower",
           color = c("#FC4E07", "white", "#00AFBB"),
           outline.col = "white", lab = TRUE)

Cluster analysis

library(factoextra)
USArrests %>%
  scale() %>%                           # Scale the data
  dist() %>%                            # Compute distance matrix
  hclust(method = "ward.D2") %>%        # Hierarchical clustering
  fviz_dend(cex = 0.5, k = 4, palette = "jco") # Visualize and cut 
                                              # into 4 groups

Balloon plot

Balloon plot is an alternative to bar plot for visualizing a large categorical data.

library(ggpubr)
# Data preparation
housetasks <- read.delim(
  system.file("demo-data/housetasks.txt", package = "ggpubr"),
  row.names = 1
  )
head(housetasks, 4)

##            Wife Alternating Husband Jointly
## Laundry     156          14       2       4
## Main_meal   124          20       5       4
## Dinner       77          11       7      13
## Breakfeast   82          36      15       7

# Visualization
ggballoonplot(housetasks, fill = "value")+
  scale_fill_viridis_c(option = "C")

Recommended for you

This section contains best data science and self-development resources to help you on your path.

Books - Data Science

Our Books

Practical Guide to Cluster Analysis in R by A. Kassambara (Datanovia)
Practical Guide To Principal Component Methods in R by A. Kassambara (Datanovia)
Machine Learning Essentials: Practical Guide in R by A. Kassambara (Datanovia)
R Graphics Essentials for Great Data Visualization by A. Kassambara (Datanovia)
GGPlot2 Essentials for Great Data Visualization in R by A. Kassambara (Datanovia)
Network Analysis and Visualization in R by A. Kassambara (Datanovia)
Practical Statistics in R for Comparing Groups: Numerical Variables by A. Kassambara (Datanovia)
Inter-Rater Reliability Essentials: Practical Guide in R by A. Kassambara (Datanovia)

Others

R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham & Garrett Grolemund
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Géron
Practical Statistics for Data Scientists: 50 Essential Concepts by Peter Bruce & Andrew Bruce
Hands-On Programming with R: Write Your Own Functions And Simulations by Garrett Grolemund & Hadley Wickham
An Introduction to Statistical Learning: with Applications in R by Gareth James et al.
Deep Learning with R by François Chollet & J.J. Allaire
Deep Learning with Python by François Chollet

1 2 1 1

Version: Français

SFer

26 Feb 2019

Kassambara
– thanks for this great reference!.

Q:
In the 1st example,
what would the code be
to print (as top legend):
R and R2 and p ?
(right now, the ex. only shows R and P…).

Thanks,
SFer

Sfer

Q:
In the 1st example,
what would the code be
to print (as top legend):
R and R2 and P ?
(right now, the ex. only shows: R and P…).

Thanks,
SFer
San Francisco
—

Kassambara

Thank Sfer for your feedback!

Please try the following R code:

library("ggpubr")
ggplot(mtcars, aes(mpg, wt)) +
  geom_point() +
  geom_smooth(method = lm) +
  stat_cor(
    aes(label = paste(..r.label.., ..rr.label.., ..p.label.., sep = "~`,`~")),
    method = "pearson", label.x = 20
    )

The output should look like this:

Scatter Plot with R, R2 and p-value

GGPlot Examples Best Reference

GGPlot Examples Best Reference

Related Book

Prerequisites

Scatter plot

Distribution

Density plot

Histogram

QQ Plot

Empirical cumulative distribution (ECDF)

Density ridgeline plots

Bar charts and alternatives

Line plot

Error bars

Box plots and alternatives

Time series data visualization

scatter plot matrix

Correlation analysis

Cluster analysis

Balloon plot

Recommended for you

Books - Data Science

Our Books

Others

Comments ( 3 )

Give a comment Cancel reply