Collectively, the data structures present in the AST are called expressions. An expression is any member of the set of base types created by parsing code: constant scalars, symbols, call objects, and pairlists. These are the data structures used to represent captured code from
is_expression(expr(...)) is always true61. Constants, symbols and call objects are the most important, and are discussed below. Pairlists and empty symbols are more specialised and we’ll come back to them in Sections 18.6.1 and Section 18.6.2.
NB: In base R documentation “expression” is used to mean two things. As well as the definition above, expression is also used to refer to the type of object returned by
parse(), which are basically lists of expressions as defined above. In this book I’ll call these expression vectors, and I’ll come back to them in Section 18.6.3.
Scalar constants are the simplest component of the AST. More precisely, a constant is either
NULL or a length-1 atomic vector (or scalar, Section 3.2.1) like
"x". You can test for a constant with
Constants are self-quoting in the sense that the expression used to represent a constant is the same constant:
identical(expr(TRUE), TRUE) #>  TRUE identical(expr(1), 1) #>  TRUE identical(expr(2L), 2L) #>  TRUE identical(expr("x"), "x") #>  TRUE
A symbol represents the name of an object like
mean. In base R, the terms symbol and name are used interchangeably (i.e.
is.name() is identical to
is.symbol()), but in this book I used symbol consistently because “name” has many other meanings.
You can create a symbol in two ways: by capturing code that references an object with
expr(), or turning a string into a symbol with
expr(x) #> x sym("x") #> x
You can turn a symbol back into a string with
as_string() has the advantage of clearly signalling that you’ll get a character vector of length 1.
as_string(expr(x)) #>  "x"
You can recognise a symbol because it’s printed without quotes,
str() tells you that it’s a symbol, and
str(expr(x)) #> symbol x is.symbol(expr(x)) #>  TRUE
The symbol type is not vectorised, i.e. a symbol is always length 1. If you want multiple symbols, you’ll need to put them in a list, using (e.g.)
A call object represents a captured function call. Call objects are a special type of list62 where the first component specifies the function to call (usually a symbol), and the remaining elements are the arguments for that call. Call objects create branches in the AST, because calls can be nested inside other calls.
You can identify a call object when printed because it looks just like a function call. Confusingly
str() print “language”63 for call objects, but
::ast(read.table("important.csv", row.names = FALSE)) lobstr#> █─read.table #> ├─"important.csv" #> └─row.names = FALSE expr(read.table("important.csv", row.names = FALSE)) x <- typeof(x) #>  "language" is.call(x) #>  TRUE
Calls generally behave like lists, i.e. you can use standard subsetting tools. The first element of the call object is the function to call, which is usually a symbol:
1]] x[[#> read.table is.symbol(x[]) #>  TRUE
The remainder of the elements are the arguments:
as.list(x[-1]) #> [] #>  "important.csv" #> #> $row.names #>  FALSE
You can extract individual arguments with
[[ or, if named,
2]] x[[#>  "important.csv" $row.names x#>  FALSE
You can determine the number of arguments in a call object by subtracting 1 from its length:
length(x) - 1 #>  2
Extracting specific arguments from calls is challenging because of R’s flexible rules for argument matching: it could potentially be in any location, with the full name, with an abbreviated name, or with no name. To work around this problem, you can use
rlang::call_standardise() which standardises all arguments to use the full name:
::call_standardise(x) rlang#> read.table(file = "important.csv", row.names = FALSE)
(NB: If the function uses
... it’s not possible to standardise all arguments.)
Calls can be modified in the same way as lists:
$header <- TRUE x x#> read.table("important.csv", row.names = FALSE, header = TRUE)
184.108.40.206 Function position
The first element of the call object is the function position. This contains the function that will be called when the object is evaluated, and is usually a symbol64:
::ast(foo()) lobstr#> █─foo
While R allows you to surround the name of the function with quotes, the parser converts it to a symbol:
::ast("foo"()) lobstr#> █─foo
However, sometimes the function doesn’t exist in the current environment and you need to do some computation to retrieve it: for example, if the function is in another package, is a method of an R6 object, or is created by a function factory. In this case, the function position will be occupied by another call:
::ast(pkg::foo(1)) lobstr#> █─█─`::` #> │ ├─pkg #> │ └─foo #> └─1 ::ast(obj$foo(1)) lobstr#> █─█─`$` #> │ ├─obj #> │ └─foo #> └─1 ::ast(foo(1)(2)) lobstr#> █─█─foo #> │ └─1 #> └─2
You can construct a call object from its components using
rlang::call2(). The first argument is the name of the function to call (either as a string, a symbol, or another call). The remaining arguments will be passed along to the call:
call2("mean", x = expr(x), na.rm = TRUE) #> mean(x = x, na.rm = TRUE) call2(expr(base::mean), x = expr(x), na.rm = TRUE) #> base::mean(x = x, na.rm = TRUE)
Infix calls created in this way still print as usual.
call2("<-", expr(x), 10) #> x <- 10
call2() to create complex expressions is a bit clunky. You’ll learn another technique in Chapter 19.
The following table summarises the appearance of the different expression subtypes in
|Pairlist||Dotted pair list||
Both base R and rlang provide functions for testing for each type of input, although the types covered are slightly different. You can easily tell them apart because all the base functions start with
is. and the rlang functions start with
Which two of the six types of atomic vector can’t appear in an expression? Why? Similarly, why can’t you create an expression that contains an atomic vector of length greater than one?
What happens when you subset a call object to remove the first element? e.g.
expr(read.csv("foo.csv", header = TRUE))[-1]. Why?
Describe the differences between the following call objects.
1:10 x <- call2(median, x, na.rm = TRUE) call2(expr(median), x, na.rm = TRUE) call2(median, expr(x), na.rm = TRUE) call2(expr(median), expr(x), na.rm = TRUE)
rlang::call_standardise()doesn’t work so well for the following calls. Why? What makes
call_standardise(quote(mean(1:10, na.rm = TRUE))) #> mean(x = 1:10, na.rm = TRUE) call_standardise(quote(mean(n = T, 1:10))) #> mean(x = 1:10, n = T) call_standardise(quote(mean(x = 1:10, , TRUE))) #> mean(x = 1:10, , TRUE)
Why does this code not make sense?
expr(foo(x = 1)) x <-names(x) <- c("x", "y")
Construct the expression
if(x > 1) "a" else "b"using multiple calls to
call2(). How does the code structure reflect the structure of the AST?
TRUEfor symbols, calls, and expression vectors.↩︎
Peculiarly, it can also be a number, as in the expression
3(). But this call will always fail to evaluate because a number is not a function.↩︎