18 Data, dots, details

18.1 What’s the pattern?

When you use ... in a function, where should you put it? It’s obvious that the data arguments must come first. But which should come next, the dots or the details? This pattern tells you to place ... between the data arguments (the required arguments that supply the key “data” to the function) and the details arguments (optional additional arguments that control the finer details of the function).

18.2 What are some examples?

Many functions in base R take data, then details, then dots:

args(unique)
#> function (x, incomparables = FALSE, ...) 
#> NULL
args(median)
#> function (x, na.rm = FALSE, ...) 
#> NULL

This doesn’t cause many problems because most people will fully spell out the names of details arguments. However, there are other summary functions that take the data via dots, then details.

sum(2, 3, 10)
#> [1] 15
prod(2, 3, 10)
#> [1] 60

args(sum)
#> function (..., na.rm = FALSE) 
#> NULL
args(prod)
#> function (..., na.rm = FALSE) 
#> NULL

This allows these functions to take any number of input vectors, but these two different interfaces make it very easy for users to construct calls that are technically valid, but that don’t return the desired result.

If you’re expecting median() to work like sum(), you might call it the same way:

median(2, 3, 10)
#> [1] 2

This silently returns an incorrect result because median() has arguments x, na.rm, and .... The user must remember that median() – and mean() too! – require data that is packed into a single vector.

median(c(2, 3, 10))
#> [1] 3
mean(c(2, 3, 10))
#> [1] 5

18.3 What is it important?

There are three primary advantages:

  • It forces the user of your function to fully name the detail arguments, because arguments that come after ... are never matched by position or partially by name. Using full names for details arguments is good practice, because it makes code easier to read.

  • You can easily add new detail arguments without changing the meaning of existing function calls. This makes it easy to extend your function with new capabilities, becaue you don’t need to worry about changing existing code.

  • When coupled with “inspect the dots” (Chapter 21), or “dot prefix” (Chapter 20) it minimises the chances that misspelled arguments names will silently go astray.

The payoff of this pattern is not huge: it protects against a fairly unusual failure mode. However, the failure mode is silent (so easy to miss), and the pattern is very easy to apply, so I think the payoff is still worth it.

18.4 How do I do it?

Following this pattern is simple: just identity which arguments are data, and which arguments are details and then put the … in between.

18.5 How do I remediate it?

If you’ve already published a function where you’ve put ... in the wrong place, it’s easy to fix. You’ll need to use function from the ellipsis package to check that ... is as expected (e.g. from Chapters 21 or 19). Since using the full names for details arguments is good practice, making this change will typically affect little existing code, but it is an interface change so should be advertised prominently.

old_interface <- function(x, data1 = 1, data2 = 2, ...) {
}

new_interace <- function(x, ..., data1 = 1, data2 = 2) {
  ellipsis::check_dots_used()
}

We can use this approach to make a safer version of median():

safe_median <- function(x, ..., na.rm = FALSE) {
  ellipsis::check_dots_used()
  median(x, ..., na.rm = na.rm)
}

safe_median(2, 3, 10)
#> Error: 2 components of `...` were not used.
#> 
#> We detected these problematic arguments:
#> * `..1`
#> * `..2`
#> 
#> Did you misspecify an argument?