2.1 Introduction

In R, it is important to understand the distinction between an object and its name. Doing so will help you:

  • More accurately predict the performance and memory usage of your code.
  • Write faster code by avoiding accidental copies, a major source of slow code.
  • Better understand R’s functional programming tools.

The goal of this chapter is to help you understand the distinction between names and values, and when R will copy an object.

Quiz

Answer the following questions to see if you can safely skip this chapter. You can find the answers at the end of the chapter in Section 2.7.

  1. Given the following data frame, how do I create a new column called “3” that contains the sum of 1 and 2? You may only use $, not [[. What makes 1, 2, and 3 challenging as variable names?

    df <- data.frame(runif(3), runif(3))
    names(df) <- c(1, 2)
  2. In the following code, how much memory does y occupy?

    x <- runif(1e6)
    y <- list(x, x, x)
  3. On which line does a get copied in the following example?

    a <- c(1, 5, 3, 2)
    b <- a
    b[[1]] <- 10

Outline

  • Section 2.2 introduces you to the distinction between names and values, and discusses how <- creates a binding, or reference, between a name and a value.

  • Section 2.3 describes when R makes a copy: whenever you modify a vector, you’re almost certainly creating a new, modified vector. You’ll learn how to use tracemem() to figure out when a copy actually occurs. Then you’ll explore the implications as they apply to function calls, lists, data frames, and character vectors.

  • Section 2.4 explores the implications of the previous two sections on how much memory an object occupies. Since your intuition may be profoundly wrong and since utils::object.size() is unfortunately inaccurate, you’ll learn how to use lobstr::obj_size().

  • Section 2.5 describes the two important exceptions to copy-on-modify: with environments and values with a single name, objects are actually modified in place.

  • Section 2.6 concludes the chapter with a discussion of the garbage collector, which frees up the memory used by objects no longer referenced by a name.

Prerequisites

We’ll use the lobstr package to dig into the internal representation of R objects.

library(lobstr)

Sources

The details of R’s memory management are not documented in a single place. Much of the information in this chapter was gleaned from a close reading of the documentation (particularly ?Memory and ?gc), the memory profiling section of Writing R extensions (R Core Team 2018b), and the SEXPs section of R internals (R Core Team 2018a). The rest I figured out by reading the C source code, performing small experiments, and asking questions on R-devel. Any mistakes are entirely mine.

References

R Core Team. 2018a. “R Internals.” R Foundation for Statistical Computing. https://cran.r-project.org/doc/manuals/r-devel/R-ints.html.

R Core Team. 2018b. “Writing R Extensions.” R Foundation for Statistical Computing. https://cran.r-project.org/doc/manuals/r-devel/R-exts.html.