11.2 Existing function operators

There are two very useful function operators that will both help you solve common recurring problems, and give you a sense for what function operators can do: purrr::safely() and memoise::memoise().

11.2.1 Capturing errors with purrr::safely()

One advantage of for-loops is that if one of the iterations fails, you can still access all the results up to the failure:

x <- list(
  c(0.512, 0.165, 0.717),
  c(0.064, 0.781, 0.427),
  c(0.890, 0.785, 0.495),
  "oops"
)

out <- rep(NA_real_, length(x))
for (i in seq_along(x)) {
  out[[i]] <- sum(x[[i]])
}
#> Error in sum(x[[i]]): invalid 'type' (character) of argument
out
#> [1] 1.39 1.27 2.17   NA

If you do the same thing with a functional, you get no output, making it hard to figure out where the problem lies:

map_dbl(x, sum)
#> Error in .Primitive("sum")(..., na.rm = na.rm): invalid 'type' (character) of
#> argument

purrr::safely() provides a tool to help with this problem. safely() is a function operator that transforms a function to turn errors into data. (You can learn the basic idea that makes it work in Section 8.6.2.) Let’s start by taking a look at it outside of map_dbl():

safe_sum <- safely(sum)
safe_sum
#> function (...) 
#> capture_error(.f(...), otherwise, quiet)
#> <bytecode: 0x55b18b0e0918>
#> <environment: 0x55b18b0e0480>

Like all function operators, safely() takes a function and returns a wrapped function which we can call as usual:

str(safe_sum(x[[1]]))
#> List of 2
#>  $ result: num 1.39
#>  $ error : NULL
str(safe_sum(x[[4]]))
#> List of 2
#>  $ result: NULL
#>  $ error :List of 2
#>   ..$ message: chr "invalid 'type' (character) of argument"
#>   ..$ call   : language .Primitive("sum")(..., na.rm = na.rm)
#>   ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"

You can see that a function transformed by safely() always returns a list with two elements, result and error. If the function runs successfully, error is NULL and result contains the result; if the function fails, result is NULL and error contains the error.

Now lets use safely() with a functional:

out <- map(x, safely(sum))
str(out)
#> List of 4
#>  $ :List of 2
#>   ..$ result: num 1.39
#>   ..$ error : NULL
#>  $ :List of 2
#>   ..$ result: num 1.27
#>   ..$ error : NULL
#>  $ :List of 2
#>   ..$ result: num 2.17
#>   ..$ error : NULL
#>  $ :List of 2
#>   ..$ result: NULL
#>   ..$ error :List of 2
#>   .. ..$ message: chr "invalid 'type' (character) of argument"
#>   .. ..$ call   : language .Primitive("sum")(..., na.rm = na.rm)
#>   .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"

The output is in a slightly inconvenient form, since we have four lists, each of which is a list containing the result and the error. We can make the output easier to use by turning it “inside-out” with purrr::transpose(), so that we get a list of results and a list of errors:

out <- transpose(map(x, safely(sum)))
str(out)
#> List of 2
#>  $ result:List of 4
#>   ..$ : num 1.39
#>   ..$ : num 1.27
#>   ..$ : num 2.17
#>   ..$ : NULL
#>  $ error :List of 4
#>   ..$ : NULL
#>   ..$ : NULL
#>   ..$ : NULL
#>   ..$ :List of 2
#>   .. ..$ message: chr "invalid 'type' (character) of argument"
#>   .. ..$ call   : language .Primitive("sum")(..., na.rm = na.rm)
#>   .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"

Now we can easily find the results that worked, or the inputs that failed:

ok <- map_lgl(out$error, is.null)
ok
#> [1]  TRUE  TRUE  TRUE FALSE

x[!ok]
#> [[1]]
#> [1] "oops"

out$result[ok]
#> [[1]]
#> [1] 1.39
#> 
#> [[2]]
#> [1] 1.27
#> 
#> [[3]]
#> [1] 2.17

You can use this same technique in many different situations. For example, imagine you’re fitting a generalised linear model (GLM) to a list of data frames. GLMs can sometimes fail because of optimisation problems, but you still want to be able to try to fit all the models, and later look back at those that failed:

fit_model <- function(df) {
  glm(y ~ x1 + x2 * x3, data = df)
}

models <- transpose(map(datasets, safely(fit_model)))
ok <- map_lgl(models$error, is.null)

# which data failed to converge?
datasets[!ok]

# which models were successful?
models[ok]

I think this is a great example of the power of combining functionals and function operators: safely() lets you succinctly express what you need to solve a common data analysis problem.

purrr comes with three other function operators in a similar vein:

  • possibly(): returns a default value when there’s an error. It provides no way to tell if an error occured or not, so it’s best reserved for cases when there’s some obvious sentinel value (like NA).

  • quietly(): turns output, messages, and warning side-effects into output, message, and warning components of the output.

  • auto_browser(): automatically executes browser() inside the function when there’s an error.

See their documentation for more details.

11.2.2 Caching computations with memoise::memoise()

Another handy function operator is memoise::memoise(). It memoises a function, meaning that the function will remember previous inputs and return cached results. Memoisation is an example of the classic computer science tradeoff of memory versus speed. A memoised function can run much faster, but because it stores all of the previous inputs and outputs, it uses more memory.

Let’s explore this idea with a toy function that simulates an expensive operation:

slow_function <- function(x) {
  Sys.sleep(1)
  x * 10 * runif(1)
}
system.time(print(slow_function(1)))
#> [1] 0.808
#>    user  system elapsed 
#>       0       0       1

system.time(print(slow_function(1)))
#> [1] 8.34
#>    user  system elapsed 
#>   0.002   0.000   1.003

When we memoise this function, it’s slow when we call it with new arguments. But when we call it with arguments that it’s seen before it’s instantaneous: it retrieves the previous value of the computation.

fast_function <- memoise::memoise(slow_function)
system.time(print(fast_function(1)))
#> [1] 6.01
#>    user  system elapsed 
#>   0.001   0.000   1.002

system.time(print(fast_function(1)))
#> [1] 6.01
#>    user  system elapsed 
#>   0.015   0.000   0.015

A relatively realistic use of memoisation is computing the Fibonacci series. The Fibonacci series is defined recursively: the first two values are defined by convention, \(f(0) = 0\), \(f(1) = 1\), and then \(f(n) = f(n - 1) + f(n - 2)\) (for any positive integer). A naive version is slow because, for example, fib(10) computes fib(9) and fib(8), and fib(9) computes fib(8) and fib(7), and so on.

fib <- function(n) {
  if (n < 2) return(1)
  fib(n - 2) + fib(n - 1)
}
system.time(fib(23))
#>    user  system elapsed 
#>   0.046   0.004   0.050
system.time(fib(24))
#>    user  system elapsed 
#>   0.063   0.000   0.063

Memoising fib() makes the implementation much faster because each value is computed only once:

fib2 <- memoise::memoise(function(n) {
  if (n < 2) return(1)
  fib2(n - 2) + fib2(n - 1)
})
system.time(fib2(23))
#>    user  system elapsed 
#>   0.024   0.000   0.025

And future calls can rely on previous computations:

system.time(fib2(24))
#>    user  system elapsed 
#>   0.001   0.000   0.001

This is an example of dynamic programming, where a complex problem can be broken down into many overlapping subproblems, and remembering the results of a subproblem considerably improves performance.

Think carefully before memoising a function. If the function is not pure, i.e. the output does not depend only on the input, you will get misleading and confusing results. I created a subtle bug in devtools because I memoised the results of available.packages(), which is rather slow because it has to download a large file from CRAN. The available packages don’t change that frequently, but if you have an R process that’s been running for a few days, the changes can become important, and because the problem only arose in long-running R processes, the bug was very painful to find.

11.2.3 Exercises

  1. Base R provides a function operator in the form of Vectorize(). What does it do? When might you use it?

  2. Read the source code for possibly(). How does it work?

  3. Read the source code for safely(). How does it work?