Parallel Processing in R with furrr

Introduction

The furrr package extends the functionality of purrr by enabling parallel processing using futures. This means you can easily transform your sequential, tidyverse-friendly workflows into parallelized ones—speeding up your computations without sacrificing code readability. In this tutorial, you’ll learn how to set up furrr, configure future plans for parallel execution, and apply functions such as future_map() to process data concurrently.

Installing and Loading furrr

First, ensure that you have furrr installed. You can install it from CRAN if necessary:

#| label: install-furrr
install.packages("furrr")

# Load the furrr package
library(furrr)

Setting Up a Future Plan

Before using furrr functions, you need to set a future plan that determines how tasks are distributed. For example, to run tasks in parallel using multiple sessions:

#| label: set-future-plan
library(future)
# Set the plan to use multisession
1plan(multisession, workers = availableCores() - 1)

1: Set the plan to use multiple sessions, with the number of workers equal to the available cores minus one. This is suitable for most local machines.

Using furrr Functions

The core function provided by furrr is future_map(), which works similarly to purrr::map() but executes operations in parallel.

Example: Parallel Mapping

#| label: future-map-example
# Create a simple vector
numbers <- 1:10

# Compute the square of each number in parallel
squared_numbers <- future_map(numbers, ~ .x^2)
print(squared_numbers)

Example: Returning Specific Types with future_map_dbl()

If you expect numeric output, you can use future_map_dbl() to return a double vector:

#| label: future-map-dbl-example
# Compute square roots in parallel, ensuring a numeric vector output
sqrt_values <- future_map_dbl(numbers, ~ sqrt(.x))
print(sqrt_values)

Comparing purrr and furrr Workflows

To illustrate the performance benefits of parallel processing, let’s compare a standard purrr workflow with its parallelized version using furrr. In the examples below, we’ll simulate a computationally intensive task using Sys.sleep().

Standard purrr Workflow

#| label: purrr-workflow
library(purrr)

# Define a computationally intensive function
# Simulate a time-consuming task
heavy_computation <- function(x) {
  Sys.sleep(6)  
  x^2
}

# Sequential execution using purrr::map
seq_time <- system.time({
  seq_result <- map(1:10, heavy_computation)
})
print("Sequential purrr execution time:")
print(seq_time)

Output: Sequential purrr execution time

   user  system elapsed 
  0.188   0.000  60.226

Parallelized Workflow with furrr

#| label: furrr-workflow
library(furrr)
# Set the plan to use multisession
plan(multisession, workers = availableCores() - 1) 

# Parallel execution using furrr::future_map
par_time <- system.time({
  par_result <- future_map(1:10, heavy_computation)
})
print("Parallel furrr execution time:")
print(par_time)

Output: Parallel furrr execution time

   user  system elapsed 
  4.973   0.083  27.949

Note

When comparing performance, it’s important to focus on elapsed (wall-clock) time rather than just user CPU time.

In our benchmarks, the sequential workflow using purrr took about 60.226 seconds of elapsed time, whereas the parallelized version with furrr completed in only 27.949 seconds.

Although the furrr approach showed higher user CPU time due to the concurrent use of multiple cores, the key takeaway is that the overall wait time experienced by the user was almost halved.

This clearly demonstrates the benefit of parallel processing for reducing the total execution time of resource-intensive tasks.

Best Practices

Set an Appropriate Future Plan:
Choose a future plan that matches your system capabilities (e.g., multisession for local parallel processing).
Monitor Resource Usage:
Parallel processing can consume significant resources. Adjust the number of workers using availableCores() to ensure your system remains responsive.
Test Sequentially First:
Before parallelizing your code with furrr, test it sequentially using purrr to ensure correctness. Then switch to furrr for performance improvements.
Error Handling:
Consider using error-handling wrappers (e.g., safely() or possibly()) from purrr in conjunction with furrr to manage potential errors in parallel tasks.

Conclusion

furrr offers a seamless way to upgrade your tidyverse workflows with parallel processing capabilities. By setting up a future plan and using functions like future_map() and future_map_dbl(), you can significantly improve the performance of your R code without sacrificing readability. The comparison between purrr and furrr workflows illustrates the potential speed gains achievable through parallelization.

Explore More Articles

Note

Here are more articles from the same category to help you dive deeper into the topic.

Functional Programming in R

Leveraging Apply Functions and Vectorized Operations