4.3 Selecting a single element

There are two other subsetting operators: [[ and $. [[ is used for extracting single items, while x$y is a useful shorthand for x[["y"]].

4.3.1 [[

[[ is most important when working with lists because subsetting a list with [ always returns a smaller list. To help make this easier to understand we can use a metaphor:

If list x is a train carrying objects, then x[[5]] is the object in car 5; x[4:6] is a train of cars 4-6.

— @RLangTip, https://twitter.com/RLangTip/status/268375867468681216

Let’s use this metaphor to make a simple list:

x <- list(1:3, "a", 4:6)

When extracting a single element, you have two options: you can create a smaller train, i.e., fewer carriages, or you can extract the contents of a particular carriage. This is the difference between [ and [[:

When extracting multiple (or even zero!) elements, you have to make a smaller train:

Because [[ can return only a single item, you must use it with either a single positive integer or a single string. If you use a vector with [[, it will subset recursively, i.e. x[[c(1, 2)]] is equivalent to x[[1]][[2]]. This is a quirky feature that few know about, so I recommend avoiding it in favour of purrr::pluck(), which you’ll learn about in Section 4.3.3.

While you must use [[ when working with lists, I’d also recommend using it with atomic vectors whenever you want to extract a single value. For example, instead of writing:

for (i in 2:length(x)) {
  out[i] <- fun(x[i], out[i - 1])

It’s better to write:

for (i in 2:length(x)) {
  out[[i]] <- fun(x[[i]], out[[i - 1]])

Doing so reinforces the expectation that you are getting and setting individual values.

4.3.2 $

$ is a shorthand operator: x$y is roughly equivalent to x[["y"]]. It’s often used to access variables in a data frame, as in mtcars$cyl or diamonds$carat. One common mistake with $ is to use it when you have the name of a column stored in a variable:

var <- "cyl"
# Doesn't work - mtcars$var translated to mtcars[["var"]]

# Instead use [[
#>  [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

The one important difference between $ and [[ is that $ does (left-to-right) partial matching:

x <- list(abc = 1)
#> [1] 1

To help avoid this behaviour I highly recommend setting the global option warnPartialMatchDollar to TRUE:

options(warnPartialMatchDollar = TRUE)
#> Warning in x$a: partial match of 'a' to 'abc'
#> [1] 1

(For data frames, you can also avoid this problem by using tibbles, which never do partial matching.)

4.3.3 Missing and out-of-bounds indices

It’s useful to understand what happens with [[ when you use an “invalid” index. The following table summarises what happens when you subset a logical vector, list, and NULL with a zero-length object (like NULL or logical()), out-of-bounds values (OOB), or a missing value (e.g. NA_integer_) with [[. Each cell shows the result of subsetting the data structure named in the row by the type of index described in the column. I’ve only shown the results for logical vectors, but other atomic vectors behave similarly, returning elements of the same type (NB: int = integer; chr = character).

row[[col]] Zero-length OOB (int) OOB (chr) Missing
Atomic Error Error Error Error
List Error Error NULL NULL

If the vector being indexed is named, then the names of OOB, missing, or NULL components will be <NA>.

The inconsistencies in the table above led to the development of purrr::pluck() and purrr::chuck(). When the element is missing, pluck() always returns NULL (or the value of the .default argument) and chuck() always throws an error. The behaviour of pluck() makes it well suited for indexing into deeply nested data structures where the component you want may not exist (as is common when working with JSON data from web APIs). pluck() also allows you to mix integer and character indices, and provides an alternative default value if an item does not exist:

x <- list(
  a = list(1, 2, 3),
  b = list(3, 4, 5)

purrr::pluck(x, "a", 1)
#> [1] 1

purrr::pluck(x, "c", 1)

purrr::pluck(x, "c", 1, .default = NA)
#> [1] NA

4.3.4 @ and slot()

There are two additional subsetting operators, which are needed for S4 objects: @ (equivalent to $), and slot() (equivalent to [[). @ is more restrictive than $ in that it will return an error if the slot does not exist. These are described in more detail in Chapter 15.

4.3.5 Exercises

  1. Brainstorm as many ways as possible to extract the third value from the cyl variable in the mtcars dataset.

  2. Given a linear model, e.g., mod <- lm(mpg ~ wt, data = mtcars), extract the residual degrees of freedom. Then extract the R squared from the model summary (summary(mod))