11.3 Case study: Creating your own function operators

meomoise() and safely() are very useful but also quite complex. In this case study you’ll learn how to create your own simpler function operators. Imagine you have a named vector of URLs and you’d like to download each one to disk. That’s pretty simple with walk2() and file.download():

urls <- c(
  "adv-r" = "https://adv-r.hadley.nz", 
  "r4ds" = "http://r4ds.had.co.nz/"
  # and many many more
)
path <- paste(tempdir(), names(urls), ".html")

walk2(urls, path, download.file, quiet = TRUE)

This approach is fine for a handful of URLs, but as the vector gets longer, you might want to add a couple more features:

  • Add a small delay between each request to avoid hammering the server.

  • Display a . every few URLs so that we know that the function is still working.

It’s relatively easy to add these extra features if we’re using a for loop:

for(i in seq_along(urls)) {
  Sys.sleep(0.1)
  if (i %% 10 == 0) cat(".")
  download.file(urls[[i]], paths[[i]])
}

I think this for loop is suboptimal because it interleaves different concerns: pausing, showing progress, and downloading. This makes the code harder to read, and it makes it harder to reuse the components in new situations. Instead, let’s see if we can use function operators to extract out pausing and showing progress and make them reusable.

First, let’s write a function operator that adds a small delay. I’m going to call it delay_by() for reasons that will be more clear shortly, and it has two arguments: the function to wrap, and the amount of delay to add. The actual implementation is quite simple. The main trick is forcing evaluation of all arguments as described in Section 10.2.5, because function operators are a special type of function factory:

delay_by <- function(f, amount) {
  force(f)
  force(amount)
  
  function(...) {
    Sys.sleep(amount)
    f(...)
  }
}
system.time(runif(100))
#>    user  system elapsed 
#>       0       0       0
system.time(delay_by(runif, 0.1)(100))
#>    user  system elapsed 
#>   0.000   0.000   0.101

And we can use it with the original walk2():

walk2(urls, path, delay_by(download.file, 0.1), quiet = TRUE)

Creating a function to display the occasional dot is a little harder, because we can no longer rely on the index from the loop. We could pass the index along as another argument, but that breaks encapsulation: a concern of the progress function now becomes a problem that the higher level wrapper needs to handle. Instead, we’ll use another function factory trick (from Section 10.2.4), so that the progress wrapper can manage its own internal counter:

dot_every <- function(f, n) {
  force(f)
  force(n)
  
  i <- 0
  function(...) {
    i <<- i + 1
    if (i %% n == 0) cat(".")
    f(...)
  }
}
walk(1:100, runif)
walk(1:100, dot_every(runif, 10))
#> ..........

Now we can express our original for loop as:

walk2(
  urls, path, 
  dot_every(delay_by(download.file, 0.1), 10), 
  quiet = TRUE
)

This is starting to get a little hard to read because we are composing many function calls, and the arguments are getting spread out. One way to resolve that is to use the pipe:

walk2(
  urls, path, 
  download.file %>% dot_every(10) %>% delay_by(0.1), 
  quiet = TRUE
)

The pipe works well here because I’ve carefully chosen the function names to yield an (almost) readable sentence: take download.file then (add) a dot every 10 iterations, then delay by 0.1s. The more clearly you can express the intent of your code through function names, the more easily others (including future you!) can read and understand the code.

11.3.1 Exercises

  1. Weigh the pros and cons of download.file %>% dot_every(10) %>% delay_by(0.1) versus download.file %>% delay_by(0.1) %>% dot_every(10).

  2. Should you memoise file.download()? Why or why not?

  3. Create a function operator that reports whenever a file is created or deleted in the working directory, using dir() and setdiff(). What other global function effects might you want to track?

  4. Write a function operator that logs a timestamp and message to a file every time a function is run.

  5. Modify delay_by() so that instead of delaying by a fixed amount of time, it ensures that a certain amount of time has elapsed since the function was last called. That is, if you called g <- delay_by(1, f); g(); Sys.sleep(2); g() there shouldn’t be an extra delay.