Course description
In this course, you will learn how to easily perform data manipulation using R software. We’ll cover the following data manipulation techniques:
- filtering and ordering rows,
- renaming and adding columns,
- computing summary statistics
We’ll use mainly the popular dplyr R package, which contains important R functions to carry out easily your data manipulation. In the final section, we’ll show you how to group your data by a grouping variable, and then compute some summary statitistics on each subset. You will also learn how to chain your data manipulation operations.
At the end of this course, you will be familiar with data manipulation tools and approaches that will allow you to efficiently manipulate data.
Required R packages
We recommend to install the tidyverse
packages, which include the dplyr
package (for data manipulation) and additional R packages for easily reading (readr
), transforming (tidyr
) and visualizing (ggplot2
) datasets.
- Install:
install.packages("tidyverse")
- Load the
tidyverse
packages, which also include thedplyr
package:
library("tidyverse")
Demo datasets
We’ll use mainly the R built-in iris
data set, which we start by converting into a tibble data frame (tbl_df
) for easier data analysis. tbl_df
data object is a data frame providing a nicer printing method, useful when working with large data sets.
library("tidyverse")
my_data <- as_tibble(iris)
my_data
## # A tibble: 150 x 5
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## <dbl> <dbl> <dbl> <dbl> <fct>
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## # ... with 144 more rows
Note that, the type of data in each column is specified. Common types include:
- int: integers
- dbl: double (real numbers),
- chr: character vectors, strings, texts
- fctr: factor,
- dttm: date-times (date + time)
- lgl: logical (TRUE or FALSE)
- date: dates
Main data manipulation functions
There are 8 fundamental data manipulation verbs that you will use to do most of your data manipulations. These functions are included in the dplyr
package:
filter()
: Pick rows (observations/samples) based on their values.distinct()
: Remove duplicate rows.arrange()
: Reorder the rows.select()
: Select columns (variables) by their names.rename()
: Rename columns.mutate()
andtransmutate()
: Add/create new variables.summarise()
: Compute statistical summaries (e.g., computing the mean or the sum)
It’s also possible to combine each of these verbs with the function group_by() to operate on subsets of the data set (group-by-group).
All these functions work similarly as follow:
- The first argument is a data frame
- The subsequent arguments are comma separated list of unquoted variable names and the specification of what you want to do
- The result is a new data frame
You will learn how to use these functions, as well as, how to chain your data manipulation operations using the pipe operator (%>%
).
Note that, dplyr package allows to use the forward-pipe chaining operator (%>%) for combining multiple operations. For example, x %>% f is equivalent to f(x). Using the pipe (%>%), the output of each operation is passed to the next operation. This makes R programming easy.
How can I put/display the first column from numeric to text?
You can simply use this:
or use dply verbs and specify the column by name:
I am trying to put my data on a format compatible with HiClimR like TestCase of the package::
My data is in netcdf I read using below command:
lon <- ncvar_get(nc, "lon")
lat <- ncvar_get(nc, "lat")
time <- ncvar_get(nc, "time")
pr<-ncvar_get(nc, "pre")
How can I create the datframe compatible with HiClimR? similar to the TestCase in the package?
Your question is very specific to the HiClimR. You need to refer to the package documentation.
I have a matrix with column data as years as date but when using as.Date it expects something %y%m%d how to rename column to %Y only as date but not character?
example 2001-01-01 rename as 2001
Your comment is awaiting moderation.
Hi, the courses only have text, no video?
Hi, there is no video for the course
How can I start lessons
May i know how you create those green chunks and that check mark at the top left corner?
Oh, how to add those square icon of unordered list?