6.4 Lexical scoping

In Chapter 2, we discussed assignment, the act of binding a name to a value. Here we’ll discuss scoping, the act of finding the value associated with a name.

The basic rules of scoping are quite intuitive, and you’ve probably already internalised them, even if you never explicitly studied them. For example, what will the following code return, 10 or 20?25

x <- 10
g01 <- function() {
  x <- 20
  x
}

g01()

In this section, you’ll learn the formal rules of scoping as well as some of its more subtle details. A deeper understanding of scoping will help you to use more advanced functional programming tools, and eventually, even to write tools that translate R code into other languages.

R uses lexical scoping26: it looks up the values of names based on how a function is defined, not how it is called. “Lexical” here is not the English adjective that means relating to words or a vocabulary. It’s a technical CS term that tells us that the scoping rules use a parse-time, rather than a run-time structure.

R’s lexical scoping follows four primary rules:

  • Name masking
  • Functions versus variables
  • A fresh start
  • Dynamic lookup

6.4.1 Name masking

The basic principle of lexical scoping is that names defined inside a function mask names defined outside a function. This is illustrated in the following example.

x <- 10
y <- 20
g02 <- function() {
  x <- 1
  y <- 2
  c(x, y)
}
g02()
#> [1] 1 2

If a name isn’t defined inside a function, R looks one level up.

x <- 2
g03 <- function() {
  y <- 1
  c(x, y)
}
g03()
#> [1] 2 1

# And this doesn't change the previous value of y
y
#> [1] 20

The same rules apply if a function is defined inside another function. First, R looks inside the current function. Then, it looks where that function was defined (and so on, all the way up to the global environment). Finally, it looks in other loaded packages.

Run the following code in your head, then confirm the result by running the code.27

x <- 1
g04 <- function() {
  y <- 2
  i <- function() {
    z <- 3
    c(x, y, z)
  }
  i()
}
g04()

The same rules also apply to functions created by other functions, which I call manufactured functions, the topic of Chapter 10.

6.4.2 Functions versus variables

In R, functions are ordinary objects. This means the scoping rules described above also apply to functions:

g07 <- function(x) x + 1
g08 <- function() {
  g07 <- function(x) x + 100
  g07(10)
}
g08()
#> [1] 110

However, when a function and a non-function share the same name (they must, of course, reside in different environments), applying these rules gets a little more complicated. When you use a name in a function call, R ignores non-function objects when looking for that value. For example, in the code below, g09 takes on two different values:

g09 <- function(x) x + 100
g10 <- function() {
  g09 <- 10
  g09(g09)
}
g10()
#> [1] 110

For the record, using the same name for different things is confusing and best avoided!

6.4.3 A fresh start

What happens to values between invocations of a function? Consider the example below. What will happen the first time you run this function? What will happen the second time?28 (If you haven’t seen exists() before, it returns TRUE if there’s a variable with that name and returns FALSE if not.)

g11 <- function() {
  if (!exists("a")) {
    a <- 1
  } else {
    a <- a + 1
  }
  a
}

g11()
g11()

You might be surprised that g11() always returns the same value. This happens because every time a function is called a new environment is created to host its execution. This means that a function has no way to tell what happened the last time it was run; each invocation is completely independent. We’ll see some ways to get around this in Section 10.2.4.

6.4.4 Dynamic lookup

Lexical scoping determines where, but not when to look for values. R looks for values when the function is run, not when the function is created. Together, these two properties tell us that the output of a function can differ depending on the objects outside the function’s environment:

g12 <- function() x + 1
x <- 15
g12()
#> [1] 16

x <- 20
g12()
#> [1] 21

This behaviour can be quite annoying. If you make a spelling mistake in your code, you won’t get an error message when you create the function. And depending on the variables defined in the global environment, you might not even get an error message when you run the function.

To detect this problem, use codetools::findGlobals(). This function lists all the external dependencies (unbound symbols) within a function:

codetools::findGlobals(g12)
#> [1] "+" "x"

To solve this problem, you can manually change the function’s environment to the emptyenv(), an environment which contains nothing:

environment(g12) <- emptyenv()
g12()
#> Error in x + 1: could not find function "+"

The problem and its solution reveal why this seemingly undesirable behaviour exists: R relies on lexical scoping to find everything, from the obvious, like mean(), to the less obvious, like + or even {. This gives R’s scoping rules a rather beautiful simplicity.

6.4.5 Exercises

  1. What does the following code return? Why? Describe how each of the three c’s is interpreted.

    c <- 10
    c(c = c)
  2. What are the four principles that govern how R looks for values?

  3. What does the following function return? Make a prediction before running the code yourself.

    f <- function(x) {
      f <- function(x) {
        f <- function() {
          x ^ 2
        }
        f() + 1
      }
      f(x) * 2
    }
    f(10)

  1. I’ll “hide” the answers to these challenges in the footnotes. Try solving them before looking at the answer; this will help you to better remember the correct answer. In this case, g01() will return 20.↩︎

  2. Functions that automatically quote one or more arguments can override the default scoping rules to implement other varieties of scoping. You’ll learn more about that in Chapter 20.↩︎

  3. g04() returns c(1, 2, 3).↩︎

  4. g11() returns 1 every time it’s called.↩︎