4 Name details arguments
4.1 What are data and details arguments?
The arguments to a function typically fall into two broad sets: one set supplies the data to compute on, and the other supplies arguments that control the details of the computation. For example:
In
log()
, the data isx
, and the detail is thebase
of the logarithm.In
mean()
, the data isx
, and the details are how much data to trim from the ends (trim
) and how to handle missing values (na.rm
).In
t.test()
, the data arex
andy
, and the details of the test are specified by thealternative
,mu
,paired
,var.equal
, andconf.level
arguments.
Typically, data arguments don’t have default values, and work with vectors or data frames, while details arguments have defaults, and take single values (like TRUE
or FALSE
, or a single string that specifies a method).
4.2 What’s the pattern?
When calling a function, data arguments come first, specified by position, followed by details arguments specified by name.
c(1:10, NA)
y <-# Good
mean(y, na.rm = TRUE)
#> [1] 5.5
# Bad
mean(x = y, , TRUE)
#> [1] 5.5
mean(, TRUE, x = y)
#> [1] 5.5
Never use partial matching, which allows you to refer to an argument by a unique prefix, e.g. mean(x, n = TRUE)
. Partial matching was useful in the early days of R because when you were doing a quick and dirty interactive analysis you could save a little time by shortening argument names. However, today, most R editing environments support autocomplete so partial matching only saves you a single keystroke, and it makes code substantially harder to read.
You can make R give you are warning that you’re using a partially named argument with a special option. Call usethis::use_partial_warnings()
to make this the default for all R sessions.
options(warnPartialMatchArgs = TRUE)
mean(x = 1:10, n = FALSE)
#> Warning in mean.default(x = 1:10, n = FALSE): partial argument match of 'n' to
#> 'na.rm'
#> [1] 5.5
4.3 Why is this useful?
I think it’s reasonable to assume that the reader knows what a function does then they know what the data arguments are (and their order), and repeating their names just takes up space without aiding communication. This then leads naturally to %>%
where you don’t specify the name of the first argument either (since it comes from the left-hand side of %>%
.)
However, I don’t think it’s reasonable to expect that people will remember the order of the details arguments. For example, I don’t think that most people know that the second argument to mean()
is trim
, even though mean()
is an extremely commonly used function. Spelling the names out in
4.4 What are the exceptions?
I think the main exception to this rule is when you are teaching a function for the first time. It makes sense to emphasis the names of the data arguments to help people understand exactly what’s going on. For example, in R for data science when we introduce ggplot2 we write code like:
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point()
At the end of the chapter, we assume that the reader is familiar with the basic structure and so the rest of the book uses the style recommended here:
ggplot(mpg, aes(displ, hwy)) +
geom_point()