20.5 Using tidy evaluation
While it’s important to understand how eval_tidy() works, most of the time you won’t call it directly. Instead, you’ll usually use it indirectly by calling a function that uses eval_tidy(). This section will give a few practical examples of wrapping functions that use tidy evaluation.
20.5.1 Quoting and unquoting
Imagine we have written a function that resamples a dataset:
resample <- function(df, n) {
idx <- sample(nrow(df), n, replace = TRUE)
df[idx, , drop = FALSE]
}We want to create a new function that allows us to resample and subset in a single step. Our naive approach doesn’t work:
subsample <- function(df, cond, n = nrow(df)) {
df <- subset2(df, cond)
resample(df, n)
}
df <- data.frame(x = c(1, 1, 1, 2, 2), y = 1:5)
subsample(df, x == 1)
#> Error in eval_tidy(rows, data): object 'x' not foundsubsample() doesn’t quote any arguments so cond is evaluated normally (not in a data mask), and we get an error when it tries to find a binding for x. To fix this problem we need to quote cond, and then unquote it when we pass it on ot subset2():
subsample <- function(df, cond, n = nrow(df)) {
cond <- enquo(cond)
df <- subset2(df, !!cond)
resample(df, n)
}
subsample(df, x == 1)
#> x y
#> 3 1 3
#> 1 1 1
#> 2 1 2This is a very common pattern; whenever you call a quoting function with arguments from the user, you need to quote them and then unquote.
20.5.2 Handling ambiguity
In the case above, we needed to think about tidy evaluation because of quasiquotation. We also need to think about tidy evaluation even when the wrapper doesn’t need to quote any arguments. Take this wrapper around subset2():
threshold_x <- function(df, val) {
subset2(df, x >= val)
}This function can silently return an incorrect result in two situations:
When
xexists in the calling environment, but not indf:x <- 10 no_x <- data.frame(y = 1:3) threshold_x(no_x, 2) #> y #> 1 1 #> 2 2 #> 3 3When
valexists indf:has_val <- data.frame(x = 1:3, val = 9:11) threshold_x(has_val, 2) #> [1] x val #> <0 rows> (or 0-length row.names)
These failure modes arise because tidy evaluation is ambiguous: each variable can be found in either the data mask or the environment. To make this function safe we need to remove the ambiguity using the .data and .env pronouns:
threshold_x <- function(df, val) {
subset2(df, .data$x >= .env$val)
}
x <- 10
threshold_x(no_x, 2)
#> Error: Column `x` not found in `.data`
threshold_x(has_val, 2)
#> x val
#> 2 2 10
#> 3 3 11Generally, whenever you use the .env pronoun, you can use unquoting instead:
threshold_x <- function(df, val) {
subset2(df, .data$x >= !!val)
}There are subtle differences in when val is evaluated. If you unquote, val will be early evaluated by enquo(); if you use a pronoun, val will be lazily evaluated by eval_tidy(). These differences are usually unimportant, so pick the form that looks most natural.
20.5.3 Quoting and ambiguity
To finish our discussion let’s consider the case where we have both quoting and potential ambiguity. I’ll generalise threshold_x() slightly so that the user can pick the variable used for thresholding. Here I used .data[[var]] because it makes the code a little simpler; in the exercises you’ll have a chance to explore how you might use $ instead.
threshold_var <- function(df, var, val) {
var <- as_string(ensym(var))
subset2(df, .data[[var]] >= !!val)
}
df <- data.frame(x = 1:10)
threshold_var(df, x, 8)
#> x
#> 8 8
#> 9 9
#> 10 10It is not always the responsibility of the function author to avoid ambiguity. Imagine we generalise further to allow thresholding based on any expression:
threshold_expr <- function(df, expr, val) {
expr <- enquo(expr)
subset2(df, !!expr >= !!val)
}It’s not possible to evaluate expr only in the data mask, because the data mask doesn’t include any functions like + or ==. Here, it’s the user’s responsibility to avoid ambiguity. As a general rule of thumb, as a function author it’s your responsibility to avoid ambiguity with any expressions that you create; it’s the user’s responsibility to avoid ambiguity in expressions that they create.