8 Avoid dependencies between arguments
8.1 What’s the problem?
Avoid creating dependencies between details arguments so that only certain combinations are permitted. Dependencies between arguments makes functions harder to use because you have to remember how arguments interact, and you when reading a call, you need to read multiple arguments before interpreting one.
8.2 What are some examples?
In
rep()
you can supply bothtimes
andeach
unlesstimes
is a vector:rep(1:3, times = 2, each = 3) #> [1] 1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3 rep(1:3, times = 1:3, each = 2) #> Error in rep(1:3, times = 1:3, each = 2): invalid 'times' argument
Learn more in Chapter 11.
In
var()
,na.rm
is only used ifuse
is not set. If you supply bothuse
andna.rm
,na.rm
is silently ignored.In
rgamma()
you can provide eitherscale
orrate
. If you supply both you will get an error or a warning:rgamma(5, shape = 1, rate = 2, scale = 1/2) #> Warning in rgamma(5, shape = 1, rate = 2, scale = 1/2): specify 'rate' or #> 'scale' but not both #> [1] 0.91093006 1.45306085 0.08907445 0.60093980 0.21717070 rgamma(5, shape = 1, rate = 2, scale = 2) #> Error in rgamma(5, shape = 1, rate = 2, scale = 2): specify 'rate' or 'scale' but not both
grepl()
hasperl
,fixed
, andignore.case
arguments which can either beTRUE
orFALSE
. Butfixed = TRUE
overridesperl = TRUE
, andignore.case
only works iffixed = FALSE
. Bothfixed
andperl
change how another argument,pattern
, is interpreted.In
library()
thecharacter.only
argument changes howpackage
is intepreted:"dplyr" ggplot2 <- # Loads ggplot2 library(ggplot2) # Loads dplyr library(ggplot2, character.only = TRUE)
forcats::fct_lump()
decides which algorithm to use based on a combination of then
andprop
arguments.In
ggplot2::geom_histogram()
, you can specify the histogram breaks in three ways: as a number ofbins
, as the width of each bin (binwidth
, pluscenter
orboundary
), or the exactbreaks
. You can only pick one of the three options, which is hard to convey in the documentation. There’s also an implied precedence so that if more than one option is supplied, one will silently win.In
readr::locale()
there’s a complex dependency betweendecimal_mark
andgrouping_mark
because they can’t be the same value, and the US and Europe use different standards.
See Sections 10.3.1 and 10.3.2 for a two exceptions where the dependency is via specific patterns of missing arguments.
8.3 Why is this important?
Having complicated interdependencies between arguments has major downsides:
It suggests that there are many more viable code paths than there really are and all those (unnecessary) possibilities still occupy head space. You have to memorise the set of allowed combinations, rather than them being implied by the structure of the function.
It increases implementation complexity. Interdependence of arguments suggests complex implementation paths which are harder to analyse and test.
It makes documentation harder to write. You have to use extra words to explain exactly how combinations of arguments work together, and it’s not obvious where those words should go. If there’s an interaction between
arg_a
andarg_b
do you document witharg_a
, witharg_b
, or with both?
8.4 How do I remediate?
Often these problems arise because the scope of a function grows over time. When the function was initially designed, the scope was small, and it grew incrementally over time. At no point did it seem worth the additional effort to refactor to a new design, but now you have a large complex function. This makes the problem hard to avoid.
To remediate the problem, you’ll need to think holistically and reconsider the complete interface. There are two common outcomes which are illustrated in the case studies below:
Splitting the function into multiple functions that each do one thing.
Encapulsating related details arguments into a single object.
See also larger case study in Chapter 11 where this problem is tangled up with other problems.
If these changes to the interface occur to exported functions in a package, you’ll need to consider how to preserve the interface with deprecation warnings. For important functions, it is worth generating an message that includes new code to copy and paste.
8.4.1 Case study: fct_lump()
There are many different ways to decide how to lump uncommon factor levels together, and initially we attempted to encode these through arguments to fct_lump()
. However, over time as the number of arguments increased, it gets harder and harder to tell what the options are. Currently there are three behaviours:
Both
n
andprop
missing - merge together the least frequent levels, ensuring thatother
is still the smallest level. (For this case, theties.method
argument is ignored.)Only
n
supplied: if positive, preservesn
most common values.Only
prop
supplied: if positive, preservesBoth
n
andprop
supplied: due to a bug in the code, this is treated the same way as bothn
andprop
missing! (But it really should be an error)
Would be better to break into three functions:
fct_lump_n(f, n)
fct_lump_prop(f, prop)
fct_lump_smallest(f)
That has three advantages:
The name of function helps remind you of the purpose.
There’s no way to supply both
n
andprop
.The
ties.method
argument would only appear infct_lump_n()
and_prop()
, not_smallest()
.
8.4.2 Case study: grepl()
vs stringr::str_detect()
grepl()
, has three arguments that take either FALSE
or TRUE
: ignore.case
, perl
, fixed
, which might suggest that there are 2 ^ 3 = 16 possible invocations. However, a number of combinations are not allowed:
grepl("a", letters, fixed = TRUE, ignore.case = TRUE)
x <-#> Warning in grepl("a", letters, fixed = TRUE, ignore.case = TRUE): argument
#> 'ignore.case = TRUE' will be ignored
grepl("a", letters, fixed = TRUE, perl = TRUE)
x <-#> Warning in grepl("a", letters, fixed = TRUE, perl = TRUE): argument 'perl =
#> TRUE' will be ignored
Part of this problem could be resolved by making it more clear that one important choice is the matching engine to use: POSIX 1003.2 extended regular expressions (the default), Perl-style regular expressions (perl = TRUE
) or fixed matching (fixed = TRUE
). A better approach would be to use the pattern in Chapter 12, and create a new argument called something like engine = c("POSIX", "perl", "fixed")
.
The other problem is that ignore.case
can only affect two of the three engines: POSIX and perl. This is hard to remedy without creating a completely new matching engine. Anything to do with case is always harder than you might expect because different languages have different rules.
stringr takes a different approach, encoding the engine as an attribute of the pattern:
library(stringr)
str_detect(letters, "a")
x <-# short for:
str_detect(letters, regex("a"))
x <- str_detect(letters, fixed("a"))
x <- str_detect(letters, coll("a")) x <-
This has the advantage that each engine can take different arguments.
An alternative approach would be to have a separate engine argument:
str_detect(letters, "a", engine = regex())
x <- str_detect(letters, "a", engine = fixed())
x <- str_detect(letters, "a", engine = coll()) x <-
This approach is a bit more discoverable (because there’s clearly another argument that affects the pattern), but it’s slightly less general, because of the boundary()
engine, which doesn’t match patterns but boundaries:
str_detect(letters, boundary("word"))
x <-# Seems confusing: now you can omit the pattern argument?
str_detect(letters, engine = boundary("word")) x <-
It would also mean that you had an argument engine
, that affected how another argument, pattern
, was interpreted, so it would repeat the problem in a slightly different form.
It’s appealing to all the details of the match wrapped up into a single object.