6.3 Function composition
Base R provides two ways to compose multiple function calls. For example, imagine you want to compute the population standard deviation using sqrt()
and mean()
as building blocks:
function(x) x^2
square <- function(x) x - mean(x) deviation <-
You either nest the function calls:
runif(100)
x <-
sqrt(mean(square(deviation(x))))
#> [1] 0.274
Or you save the intermediate results as variables:
deviation(x)
out <- square(out)
out <- mean(out)
out <- sqrt(out)
out <-
out#> [1] 0.274
The magrittr package (Bache and Wickham 2014) provides a third option: the binary operator %>%
, which is called the pipe and is pronounced as “and then”.
library(magrittr)
%>%
x deviation() %>%
square() %>%
mean() %>%
sqrt()
#> [1] 0.274
x %>% f()
is equivalent to f(x)
; x %>% f(y)
is equivalent to f(x, y)
. The pipe allows you to focus on the high-level composition of functions rather than the low-level flow of data; the focus is on what’s being done (the verbs), rather than on what’s being modified (the nouns). This style is common in Haskell and F#, the main inspiration for magrittr, and is the default style in stack based programming languages like Forth and Factor.
Each of the three options has its own strengths and weaknesses:
Nesting,
f(g(x))
, is concise, and well suited for short sequences. But longer sequences are hard to read because they are read inside out and right to left. As a result, arguments can get spread out over long distances creating the Dagwood sandwich problem.Intermediate objects,
y <- f(x); g(y)
, requires you to name intermediate objects. This is a strength when objects are important, but a weakness when values are truly intermediate.Piping,
x %>% f() %>% g()
, allows you to read code in straightforward left-to-right fashion and doesn’t require you to name intermediate objects. But you can only use it with linear sequences of transformations of a single object. It also requires an additional third party package and assumes that the reader understands piping.
Most code will use a combination of all three styles. Piping is more common in data analysis code, as much of an analysis consists of a sequence of transformations of an object (like a data frame or plot). I tend to use piping infrequently in packages; not because it is a bad idea, but because it’s often a less natural fit.
References
Bache, Stefan Milton, and Hadley Wickham. 2014. Magrittr: A Forward-Pipe Operator for R. http://magrittr.tidyverse.org/.