Functional Programming in R using purrr
Vamshidhar Gangu, Research Computing, NUS Information Technology.
FP (Functional programming) is a programming philosophy based on lambda calculus and is very much suitable for data science. In simple terms, FP is exactly what it sounds like. If you are doing something more than once, it should be created as a function. These functions are the primary methods with which you should carry out tasks. Many of the latest software architectures like Microservices, Serverless (FaaS – Functional as a Service) are inspired from this FP methodology.
In functional programming, your code is organised into functions that assist you to perform the operations you need. Your scripts will only be a sequence of calls to these functions, making them easier to understand. R is not a pure functional programming language, so we need some self-discipline to apply pure functional programming principles. The purrr package extends R base functional programming capabilities with some very interesting functions. Here we learn some of the functions that make FP easier in R.
Introduction to purrr
The purrr package consists of an amazing set of tools for applying a function to the items in a vector or list. The best place to start is with the family of map() functions, which allows you to replace many for-loops with codes that is both more succinct and easier to read.
map*() family
A map function is one that applies the same action/function to every element of an object (e.g., each entry of a list or a vector, or each of the columns of a data frame).
If you’re familiar with the base R apply() functions, then it turns out that you are already familiar with map functions, even if you didn’t know it. The apply() functions are set of super useful base-R functions for iteratively performing an action across entries of a vector or list without having to write a for-loop. While there is nothing fundamentally wrong with the base R apply functions, the syntax is somewhat inconsistent across the different apply functions, and the expected type of object they return is often ambiguous (at least it is for sapply..)
The naming convention of the map functions are such that the type of output is specified by the term that follows the underscore in the function name.
- map(.x, .f) is the main mapping function and returns a list
- map_df(.x, .f) returns a data frame
- map_dbl(.x, .f) returns a numeric (double) vector
- map_chr(.x, .f) returns a character vector
- map_lgl(.x, .f) returns a logical vector
Since the first argument is always the data, this means that map functions play nicely with pipes (%>%) which are consistent with the tidyverse syntax.
map example
# Install or load the tidyverse package
install.packages("tidyverse", repos = "http://cran.us.r-project.org", dependencies = TRUE)
library(tidyverse)
# example data
moons <-
list(
earth = 1737.1,
mars = c(11.3, 6.2),
neptune =
c(60.4, 81.4, 156, 174.8, 194, 34.8, 420, 2705.2, 340, 62, 44, 42, 40, 60)
)
# getting length of each list element
map(moons, length)
# finding means of each list element
map(moons, median)
## map_dbl returns vector of doubles
map_dbl(moons, median)
map_int(moons, median)
#> Error: Can't coerce element 1 from a double to element
# passing extra arguments of a function
map(moons, sort)
#> $earth
#> [1] 1737
#>
#> $mars
#> [1] 6.2 11.3
#>
#> $neptune
#> [1] 34.8 40.0 42.0 44.0 60.0 60.4 62.0 81.4 156.0 174.8
#> [11] 194.0 340.0 420.0 2705.2
map(moons, sort, decreasing = TRUE)
#> $earth
#> [1] 1737
#>
#> $mars
#> [1] 11.3 6.2
#>
#> $neptune
#> [1] 2705.2 420.0 340.0 194.0 174.8 156.0 81.4 62.0 60.4 60.0
#> [11] 44.0 42.0 40.0 34.8
Other functions
map2 – for maps with multiple input objects. The first two arguments are the two objects you want to iterate over, and the third is the function:-
map2(.x = object1, .y = object2, .f = plotFunction(.x, .y))
map_if / map_at – apply a function to each element of a vector conditionally
# Use a predicate function to decide whether to map a function:
map_if(iris, is.factor, as.character)
# Specify an alternative with the `.else` argument:
map_if(iris, is.factor, as.character, .else = as.integer)
pmap() – similar to map2(), instead of mapping across two vectors or lists, you can map across any number of lists.
pmap_chr(list(
list(1, 2, 3),
list("one", "two", "three"),
list("uno", "dos", "tres")
), paste)
[1] "1 one uno" "2 two dos" "3 three tres"
The purrr package contains more functions than we can cover. The purrr cheatsheet is a great way to find helpful functions when you encounter a new type of iteration problem.