14.4 Reference semantics

One of the big differences between R6 and most other objects is that they have reference semantics. The primary consequence of reference semantics is that objects are not copied when modified:

y1 <- Accumulator$new() 
y2 <- y1

y1$add(10)
c(y1 = y1$sum, y2 = y2$sum)
#> y1 y2 
#> 10 10

Instead, if you want a copy, you’ll need to explicitly $clone() the object:

y1 <- Accumulator$new() 
y2 <- y1$clone()

y1$add(10)
c(y1 = y1$sum, y2 = y2$sum)
#> y1 y2 
#> 10  0

($clone() does not recursively clone nested R6 objects. If you want that, you’ll need to use $clone(deep = TRUE).)

There are three other less obvious consequences:

It is harder to reason about code that uses R6 objects because you need to understand more context.
It makes sense to think about when an R6 object is deleted, and you can write a $finalize() to complement the $initialize().
If one of the fields is an R6 object, you must create it inside $initialize(), not R6Class().

These consequences are described in more detail below.

14.4.1 Reasoning

Generally, reference semantics makes code harder to reason about. Take this very simple example:

x <- list(a = 1)
y <- list(b = 2)

z <- f(x, y)

For the vast majority of functions, you know that the final line only modifies z.

Take a similar example that uses an imaginary List reference class:

x <- List$new(a = 1)
y <- List$new(b = 2)

z <- f(x, y)

The final line is much harder to reason about: if f() calls methods of x or y, it might modify them as well as z. This is the biggest potential downside of R6 and you should take care to avoid it by writing functions that either return a value, or modify their R6 inputs, but not both. That said, doing both can lead to substantially simpler code in some cases, and we’ll discuss this further in Section 16.3.2.

14.4.2 Finalizer

One useful property of reference semantics is that it makes sense to think about when an R6 object is finalized, i.e. when it’s deleted. This doesn’t make sense for most objects because copy-on-modify semantics mean that there may be many transient versions of an object, as alluded to in Section 2.6. For example, the following creates two factor objects: the second is created when the levels are modified, leaving the first to be destroyed by the garbage collector.

x <- factor(c("a", "b", "c"))
levels(x) <- c("c", "b", "a")

Since R6 objects are not copied-on-modify they are only deleted once, and it makes sense to think about $finalize() as a complement to $initialize(). Finalizers usually play a similar role to on.exit() (as described in Section 6.7.4), cleaning up any resources created by the initializer. For example, the following class wraps up a temporary file, automatically deleting it when the class is finalized.

TemporaryFile <- R6Class("TemporaryFile", list(
  path = NULL,
  initialize = function() {
    self$path <- tempfile()
  },
  finalize = function() {
    message("Cleaning up ", self$path)
    unlink(self$path)
  }
))

The finalize method will be run when the object is deleted (or more precisely, by the first garbage collection after the object has been unbound from all names) or when R exits. This means that the finalizer can be called effectively anywhere in your R code, and therefore it’s almost impossible to reason about finalizer code that touches shared data structures. Avoid these potential problems by only using the finalizer to clean up private resources allocated by initializer.

tf <- TemporaryFile$new()
rm(tf)
#> Cleaning up /tmp/Rtmpk73JdI/file155f31d8424bd

14.4.3 R6 fields

A final consequence of reference semantics can crop up where you don’t expect it. If you use an R6 class as the default value of a field, it will be shared across all instances of the object! Take the following code: we want to create a temporary database every time we call TemporaryDatabase$new(), but the current code always uses the same path.

TemporaryDatabase <- R6Class("TemporaryDatabase", list(
  con = NULL,
  file = TemporaryFile$new(),
  initialize = function() {
    self$con <- DBI::dbConnect(RSQLite::SQLite(), path = file$path)
  },
  finalize = function() {
    DBI::dbDisconnect(self$con)
  }
))

db_a <- TemporaryDatabase$new()
db_b <- TemporaryDatabase$new()

db_a$file$path == db_b$file$path
#> [1] TRUE

(If you’re familiar with Python, this is very similar to the “mutable default argument” problem.)

The problem arises because TemporaryFile$new() is called only once when the TemporaryDatabase class is defined. To fix the problem, we need to make sure it’s called every time that TemporaryDatabase$new() is called, i.e. we need to put it in $initialize():

TemporaryDatabase <- R6Class("TemporaryDatabase", list(
  con = NULL,
  file = NULL,
  initialize = function() {
    self$file <- TemporaryFile$new()
    self$con <- DBI::dbConnect(RSQLite::SQLite(), path = file$path)
  },
  finalize = function() {
    DBI::dbDisconnect(self$con)
  }
))

db_a <- TemporaryDatabase$new()
db_b <- TemporaryDatabase$new()

db_a$file$path == db_b$file$path
#> [1] FALSE

14.4.4 Exercises

Create a class that allows you to write a line to a specified file. You should open a connection to the file in $initialize(), append a line using cat() in $append_line(), and close the connection in $finalize().