18.3 Expressions

Collectively, the data structures present in the AST are called expressions. An expression is any member of the set of base types created by parsing code: constant scalars, symbols, call objects, and pairlists. These are the data structures used to represent captured code from expr(), and is_expression(expr(...)) is always true61. Constants, symbols and call objects are the most important, and are discussed below. Pairlists and empty symbols are more specialised and we’ll come back to them in Sections 18.6.1 and Section 18.6.2.

NB: In base R documentation “expression” is used to mean two things. As well as the definition above, expression is also used to refer to the type of object returned by expression() and parse(), which are basically lists of expressions as defined above. In this book I’ll call these expression vectors, and I’ll come back to them in Section 18.6.3.

18.3.1 Constants

Scalar constants are the simplest component of the AST. More precisely, a constant is either NULL or a length-1 atomic vector (or scalar, Section 3.2.1) like TRUE, 1L, 2.5 or "x". You can test for a constant with rlang::is_syntactic_literal().

Constants are self-quoting in the sense that the expression used to represent a constant is the same constant:

identical(expr(TRUE), TRUE)
#> [1] TRUE
identical(expr(1), 1)
#> [1] TRUE
identical(expr(2L), 2L)
#> [1] TRUE
identical(expr("x"), "x")
#> [1] TRUE

18.3.2 Symbols

A symbol represents the name of an object like x, mtcars, or mean. In base R, the terms symbol and name are used interchangeably (i.e. is.name() is identical to is.symbol()), but in this book I used symbol consistently because “name” has many other meanings.

You can create a symbol in two ways: by capturing code that references an object with expr(), or turning a string into a symbol with rlang::sym():

#> x
#> x

You can turn a symbol back into a string with as.character() or rlang::as_string(). as_string() has the advantage of clearly signalling that you’ll get a character vector of length 1.

#> [1] "x"

You can recognise a symbol because it’s printed without quotes, str() tells you that it’s a symbol, and is.symbol() is TRUE:

#>  symbol x
#> [1] TRUE

The symbol type is not vectorised, i.e. a symbol is always length 1. If you want multiple symbols, you’ll need to put them in a list, using (e.g.) rlang::syms().

18.3.3 Calls

A call object represents a captured function call. Call objects are a special type of list62 where the first component specifies the function to call (usually a symbol), and the remaining elements are the arguments for that call. Call objects create branches in the AST, because calls can be nested inside other calls.

You can identify a call object when printed because it looks just like a function call. Confusingly typeof() and str() print “language”63 for call objects, but is.call() returns TRUE:

lobstr::ast(read.table("important.csv", row.names = FALSE))
#> █─read.table 
#> ├─"important.csv" 
#> └─row.names = FALSE
x <- expr(read.table("important.csv", row.names = FALSE))

#> [1] "language"
#> [1] TRUE Subsetting

Calls generally behave like lists, i.e. you can use standard subsetting tools. The first element of the call object is the function to call, which is usually a symbol:

#> read.table
#> [1] TRUE

The remainder of the elements are the arguments:

#> [[1]]
#> [1] "important.csv"
#> $row.names
#> [1] FALSE

You can extract individual arguments with [[ or, if named, $:

#> [1] "important.csv"
#> [1] FALSE

You can determine the number of arguments in a call object by subtracting 1 from its length:

length(x) - 1
#> [1] 2

Extracting specific arguments from calls is challenging because of R’s flexible rules for argument matching: it could potentially be in any location, with the full name, with an abbreviated name, or with no name. To work around this problem, you can use rlang::call_standardise() which standardises all arguments to use the full name:

#> read.table(file = "important.csv", row.names = FALSE)

(NB: If the function uses ... it’s not possible to standardise all arguments.)

Calls can be modified in the same way as lists:

x$header <- TRUE
#> read.table("important.csv", row.names = FALSE, header = TRUE) Function position

The first element of the call object is the function position. This contains the function that will be called when the object is evaluated, and is usually a symbol64:

#> █─foo

While R allows you to surround the name of the function with quotes, the parser converts it to a symbol:

#> █─foo

However, sometimes the function doesn’t exist in the current environment and you need to do some computation to retrieve it: for example, if the function is in another package, is a method of an R6 object, or is created by a function factory. In this case, the function position will be occupied by another call:

#> █─█─`::` 
#> │ ├─pkg 
#> │ └─foo 
#> └─1
#> █─█─`$` 
#> │ ├─obj 
#> │ └─foo 
#> └─1
#> █─█─foo 
#> │ └─1 
#> └─2 Constructing

You can construct a call object from its components using rlang::call2(). The first argument is the name of the function to call (either as a string, a symbol, or another call). The remaining arguments will be passed along to the call:

call2("mean", x = expr(x), na.rm = TRUE)
#> mean(x = x, na.rm = TRUE)
call2(expr(base::mean), x = expr(x), na.rm = TRUE)
#> base::mean(x = x, na.rm = TRUE)

Infix calls created in this way still print as usual.

call2("<-", expr(x), 10)
#> x <- 10

Using call2() to create complex expressions is a bit clunky. You’ll learn another technique in Chapter 19.

18.3.4 Summary

The following table summarises the appearance of the different expression subtypes in str() and typeof():

str() typeof()
Scalar constant logi/int/num/chr logical/integer/double/character
Symbol symbol symbol
Call object language language
Pairlist Dotted pair list pairlist
Expression vector expression() expression

Both base R and rlang provide functions for testing for each type of input, although the types covered are slightly different. You can easily tell them apart because all the base functions start with is. and the rlang functions start with is_.

base rlang
Scalar constant is_syntactic_literal()
Symbol is.symbol() is_symbol()
Call object is.call() is_call()
Pairlist is.pairlist() is_pairlist()
Expression vector is.expression()

18.3.5 Exercises

  1. Which two of the six types of atomic vector can’t appear in an expression? Why? Similarly, why can’t you create an expression that contains an atomic vector of length greater than one?

  2. What happens when you subset a call object to remove the first element? e.g. expr(read.csv("foo.csv", header = TRUE))[-1]. Why?

  3. Describe the differences between the following call objects.

    x <- 1:10
    call2(median, x, na.rm = TRUE)
    call2(expr(median), x, na.rm = TRUE)
    call2(median, expr(x), na.rm = TRUE)
    call2(expr(median), expr(x), na.rm = TRUE)
  4. rlang::call_standardise() doesn’t work so well for the following calls. Why? What makes mean() special?

    call_standardise(quote(mean(1:10, na.rm = TRUE)))
    #> mean(x = 1:10, na.rm = TRUE)
    call_standardise(quote(mean(n = T, 1:10)))
    #> mean(x = 1:10, n = T)
    call_standardise(quote(mean(x = 1:10, , TRUE)))
    #> mean(x = 1:10, , TRUE)
  5. Why does this code not make sense?

    x <- expr(foo(x = 1))
    names(x) <- c("x", "y")
  6. Construct the expression if(x > 1) "a" else "b" using multiple calls to call2(). How does the code structure reflect the structure of the AST?

  1. It is possible to insert any other base object into an expression, but this is unusual and only needed in rare circumstances. We’ll come back to that idea in Section 19.4.7.↩︎

  2. More precisely, they’re pairlists, Section 18.6.1, but this distinction rarely matters.↩︎

  3. Avoid is.language() which returns TRUE for symbols, calls, and expression vectors.↩︎

  4. Peculiarly, it can also be a number, as in the expression 3(). But this call will always fail to evaluate because a number is not a function.↩︎