Introduction
The furrr package extends the functionality of purrr by enabling parallel processing using futures. This means you can easily transform your sequential, tidyverse-friendly workflows into parallelized ones—speeding up your computations without sacrificing code readability. In this tutorial, you’ll learn how to set up furrr, configure future plans for parallel execution, and apply functions such as future_map()
to process data concurrently.
Installing and Loading furrr
First, ensure that you have furrr installed. You can install it from CRAN if necessary:
#| label: install-furrr
install.packages("furrr")
# Load the furrr package
library(furrr)
Setting Up a Future Plan
Before using furrr
functions, you need to set a future
plan that determines how tasks are distributed. For example, to run tasks in parallel using multiple sessions:
#| label: set-future-plan
library(future)
# Set the plan to use multisession
1plan(multisession, workers = availableCores() - 1)
- 1
- Set the plan to use multiple sessions, with the number of workers equal to the available cores minus one. This is suitable for most local machines.
Using furrr Functions
The core function provided by furrr is future_map()
, which works similarly to purrr::map()
but executes operations in parallel.
Example: Parallel Mapping
#| label: future-map-example
# Create a simple vector
<- 1:10
numbers
# Compute the square of each number in parallel
<- future_map(numbers, ~ .x^2)
squared_numbers print(squared_numbers)
Example: Returning Specific Types with future_map_dbl()
If you expect numeric output, you can use future_map_dbl()
to return a double vector:
#| label: future-map-dbl-example
# Compute square roots in parallel, ensuring a numeric vector output
<- future_map_dbl(numbers, ~ sqrt(.x))
sqrt_values print(sqrt_values)
Comparing purrr and furrr Workflows
To illustrate the performance benefits of parallel processing, let’s compare a standard purrr workflow with its parallelized version using furrr. In the examples below, we’ll simulate a computationally intensive task using Sys.sleep()
.
Standard purrr Workflow
#| label: purrr-workflow
library(purrr)
# Define a computationally intensive function
# Simulate a time-consuming task
<- function(x) {
heavy_computation Sys.sleep(6)
^2
x
}
# Sequential execution using purrr::map
<- system.time({
seq_time <- map(1:10, heavy_computation)
seq_result
})print("Sequential purrr execution time:")
print(seq_time)
Output: Sequential purrr execution time
user system elapsed
0.188 0.000 60.226
Parallelized Workflow with furrr
#| label: furrr-workflow
library(furrr)
# Set the plan to use multisession
plan(multisession, workers = availableCores() - 1)
# Parallel execution using furrr::future_map
<- system.time({
par_time <- future_map(1:10, heavy_computation)
par_result
})print("Parallel furrr execution time:")
print(par_time)
Output: Parallel furrr execution time
user system elapsed
4.973 0.083 27.949
When comparing performance, it’s important to focus on elapsed (wall-clock) time rather than just user CPU time.
In our benchmarks, the sequential workflow using purrr
took about 60.226 seconds of elapsed time, whereas the parallelized version with furrr
completed in only 27.949 seconds.
Although the furrr approach showed higher user CPU time due to the concurrent use of multiple cores, the key takeaway is that the overall wait time experienced by the user was almost halved.
This clearly demonstrates the benefit of parallel processing for reducing the total execution time of resource-intensive tasks.
Best Practices
Set an Appropriate Future Plan:
Choose afuture
plan that matches your system capabilities (e.g.,multisession
for local parallel processing).Monitor Resource Usage:
Parallel processing can consume significant resources. Adjust the number of workers usingavailableCores()
to ensure your system remains responsive.Test Sequentially First:
Before parallelizing your code with furrr, test it sequentially using purrr to ensure correctness. Then switch to furrr for performance improvements.Error Handling:
Consider using error-handling wrappers (e.g.,safely()
orpossibly()
) from purrr in conjunction withfurrr
to manage potential errors in parallel tasks.
Conclusion
furrr
offers a seamless way to upgrade your tidyverse
workflows with parallel processing capabilities. By setting up a future plan and using functions like future_map()
and future_map_dbl()
, you can significantly improve the performance of your R code without sacrificing readability. The comparison between purrr
and furrr
workflows illustrates the potential speed gains achievable through parallelization.
Further Reading
Happy coding, and enjoy leveraging furrr to accelerate your R workflows!
Explore More Articles
Here are more articles from the same category to help you dive deeper into the topic.
Reuse
Citation
@online{kassambara2024,
author = {Kassambara, Alboukadel},
title = {Parallel {Processing} in {R} with Furrr},
date = {2024-02-10},
url = {https://www.datanovia.com/learn/programming/r/advanced/furrr-for-parallel-processing.html},
langid = {en}
}