7  Data Analysis and Visualization

Keywords

R in VSCode, R programming in VSCode

7.1 Introduction

Visual Studio Code (VSCode), combined with R and essential extensions, offers a powerful environment for data analysis and visualization. In this chapter, we will explore how to effectively use VSCode for performing data analysis and creating visualizations with R. This includes using popular R packages like tidyverse for data wrangling and ggplot2 for visualizations, all within the convenience of VSCode.

Data Analysis and Visualization with R in VSCode


7.2 Data Analysis

Data analysis in VSCode is streamlined through the vscode-R extension, which provides robust support for working with R scripts and interactive data exploration.

7.2.1 STEP 1. Loading Data

To load data in VSCode, you can use the R terminal integrated into the editor or write and run R scripts directly from the editor.

  • Loading CSV Files: Use the read.csv() or readr::read_csv() function to load CSV files. You can highlight the line of code and press Ctrl + Enter (Windows/Linux) or Cmd + Enter (Mac) to execute it in the active R terminal.

    # Create a demo data file
    dir.create("data", showWarnings = FALSE, )
    readr::write_csv(iris, "data/iris.csv")
    # Load the data
    data <- readr::read_csv("data/iris.csv")
  • Viewing Data: Use the View() function to open data frames in the interactive viewer provided by VSCode. This allows you to sort, filter, and explore the data directly in the editor.

    View(data)

7.2.2 STEP 2. Data Wrangling with tidyverse

The tidyverse package provides an excellent set of tools for data manipulation and transformation. In VSCode, you can leverage these tools to clean and prepare your dataset for analysis.

  • Filtering and Mutating Data: Use dplyr to filter and mutate data frames. You can run these commands interactively to see the output immediately in the R terminal.

    library(dplyr)
    filtered_data <- data %>%
      filter(Sepal.Length > 5) %>%
      mutate(Sepal.Ratio = Sepal.Length / Sepal.Width)
  • Piping Commands: The %>% (pipe) operator allows you to chain multiple operations together, which is particularly helpful in making the code readable and efficient. VSCode supports the use of pipes seamlessly, allowing for interactive execution of each step.

7.3 Data Visualization with ggplot2

Visualization is a key component of data analysis, and VSCode provides multiple ways to create, view, and interact with plots.

7.3.1 STEP 1. Creating Visualizations

The ggplot2 package is the go-to tool for creating beautiful and informative visualizations in R. In VSCode, you can use ggplot2 to generate charts and plots and view them interactively.

  • Basic Plotting: Create a scatter plot to visualize relationships between variables.

    library(ggplot2)
    ggplot(data, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
      geom_point()
  • Interactive Plot Viewing: With the httpgd package enabled, your plots will appear in the VSCode plot viewer. This allows you to zoom, export, or copy images directly from the viewer pane, making the process more efficient.

    install.packages("httpgd")
    httpgd::hgd()
    options(device = httpgd::hgd)

7.3.2 STEP 2. Customizing Visuals

Customization is key to making your plots informative and visually appealing.

  • Adding Titles and Labels: Customize your plots by adding titles, axis labels, and adjusting themes.

    ggplot(data, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
      geom_point() +
      labs(title = "Sepal Length vs Width",
           x = "Sepal Length (cm)",
           y = "Sepal Width (cm)") +
      theme_minimal()
  • Faceting: Use facet_wrap() or facet_grid() to create small multiples, which can help in understanding patterns across different subsets of data.

    ggplot(data, aes(x = Sepal.Length, y = Sepal.Width)) +
      geom_point() +
      facet_wrap(~ Species)

7.4 Interactive Visualization Tools

VSCode, through the vscode-R extension, supports interactive visualizations that enhance data exploration.

  • Plot Viewer: The plot viewer in VSCode allows you to interact with your visualizations. Using httpgd, you can view plots that update automatically as you make changes to your code.

  • Htmlwidgets and Shiny Apps: Htmlwidgets like plotly or interactive Shiny apps can also be rendered within VSCode, allowing you to explore data interactively without leaving the editor.

    # Example using plotly
    library(plotly)
    p <- ggplot(data, aes(x = Sepal.Length, y = Sepal.Width)) +
      geom_point()
    ggplotly(p)

7.5 Conclusion

Data analysis and visualization are central to any data science workflow, and VSCode, paired with R, provides a powerful environment for both. By leveraging the vscode-R extension, httpgd for interactive plots, and popular R packages like tidyverse and ggplot2, you can efficiently transform data and create meaningful visualizations. The integrated terminal and plot viewers in VSCode make the entire process streamlined, enabling a seamless flow from data wrangling to visualization.