13.3 Classes

If you have done object-oriented programming in other languages, you may be surprised to learn that S3 has no formal definition of a class: to make an object an instance of a class, you simply set the class attribute. You can do that during creation with structure(), or after the fact with class<-():

# Create and assign class in one step
x <- structure(list(), class = "my_class")

# Create, then set class
x <- list()
class(x) <- "my_class"

You can determine the class of an S3 object with class(x), and see if an object is an instance of a class using inherits(x, "classname").

class(x)
#> [1] "my_class"
inherits(x, "my_class")
#> [1] TRUE
inherits(x, "your_class")
#> [1] FALSE

The class name can be any string, but I recommend using only letters and _. Avoid . because (as mentioned earlier) it can be confused with the . separator between a generic name and a class name. When using a class in a package, I recommend including the package name in the class name. That ensures you won’t accidentally clash with a class defined by another package.

S3 has no checks for correctness which means you can change the class of existing objects:

# Create a linear model
mod <- lm(log(mpg) ~ log(disp), data = mtcars)
class(mod)
#> [1] "lm"
print(mod)
#> 
#> Call:
#> lm(formula = log(mpg) ~ log(disp), data = mtcars)
#> 
#> Coefficients:
#> (Intercept)    log(disp)  
#>       5.381       -0.459

# Turn it into a date (?!)
class(mod) <- "Date"

# Unsurprisingly this doesn't work very well
print(mod)
#> Error in as.POSIXlt.Date(x): 'list' object cannot be coerced to type 'double'

If you’ve used other OO languages, this might make you feel queasy, but in practice this flexibility causes few problems. R doesn’t stop you from shooting yourself in the foot, but as long as you don’t aim the gun at your toes and pull the trigger, you won’t have a problem.

To avoid foot-bullet intersections when creating your own class, I recommend that you usually provide three functions:

  • A low-level constructor, new_myclass(), that efficiently creates new objects with the correct structure.

  • A validator, validate_myclass(), that performs more computationally expensive checks to ensure that the object has correct values.

  • A user-friendly helper, myclass(), that provides a convenient way for others to create objects of your class.

You don’t need a validator for very simple classes, and you can skip the helper if the class is for internal use only, but you should always provide a constructor.

13.3.1 Constructors

S3 doesn’t provide a formal definition of a class, so it has no built-in way to ensure that all objects of a given class have the same structure (i.e. the same base type and the same attributes with the same types). Instead, you must enforce a consistent structure by using a constructor.

The constructor should follow three principles:

  • Be called new_myclass().

  • Have one argument for the base object, and one for each attribute.

  • Check the type of the base object and the types of each attribute.

I’ll illustrate these ideas by creating constructors for base classes46 that you’re already familiar with. To start, lets make a constructor for the simplest S3 class: Date. A Date is just a double with a single attribute: its class is “Date”. This makes for a very simple constructor:

new_Date <- function(x = double()) {
  stopifnot(is.double(x))
  structure(x, class = "Date")
}

new_Date(c(-1, 0, 1))
#> [1] "1969-12-31" "1970-01-01" "1970-01-02"

The purpose of constructors is to help you, the developer. That means you can keep them simple, and you don’t need to optimise error messages for public consumption. If you expect users to also create objects, you should create a friendly helper function, called class_name(), which I’ll describe shortly.

A slightly more complicated constructor is that for difftime, which is used to represent time differences. It is again built on a double, but has a units attribute that must take one of a small set of values:

new_difftime <- function(x = double(), units = "secs") {
  stopifnot(is.double(x))
  units <- match.arg(units, c("secs", "mins", "hours", "days", "weeks"))

  structure(x,
    class = "difftime",
    units = units
  )
}

new_difftime(c(1, 10, 3600), "secs")
#> Time differences in secs
#> [1]    1   10 3600
new_difftime(52, "weeks")
#> Time difference of 52 weeks

The constructor is a developer function: it will be called in many places, by an experienced user. That means it’s OK to trade a little safety in return for performance, and you should avoid potentially time-consuming checks in the constructor.

13.3.2 Validators

More complicated classes require more complicated checks for validity. Take factors, for example. A constructor only checks that types are correct, making it possible to create malformed factors:

new_factor <- function(x = integer(), levels = character()) {
  stopifnot(is.integer(x))
  stopifnot(is.character(levels))

  structure(
    x,
    levels = levels,
    class = "factor"
  )
}

new_factor(1:5, "a")
#> Error in as.character.factor(x): malformed factor
new_factor(0:1, "a")
#> Error in as.character.factor(x): malformed factor

Rather than encumbering the constructor with complicated checks, it’s better to put them in a separate function. Doing so allows you to cheaply create new objects when you know that the values are correct, and easily re-use the checks in other places.

validate_factor <- function(x) {
  values <- unclass(x)
  levels <- attr(x, "levels")

  if (!all(!is.na(values) & values > 0)) {
    stop(
      "All `x` values must be non-missing and greater than zero",
      call. = FALSE
    )
  }

  if (length(levels) < max(values)) {
    stop(
      "There must be at least as many `levels` as possible values in `x`",
      call. = FALSE
    )
  }

  x
}

validate_factor(new_factor(1:5, "a"))
#> Error: There must be at least as many `levels` as possible values in `x`
validate_factor(new_factor(0:1, "a"))
#> Error: All `x` values must be non-missing and greater than zero

This validator function is called primarily for its side-effects (throwing an error if the object is invalid) so you’d expect it to invisibly return its primary input (as described in Section 6.7.2). However, it’s useful for validation methods to return visibly, as we’ll see next.

13.3.3 Helpers

If you want users to construct objects from your class, you should also provide a helper method that makes their life as easy as possible. A helper should always:

  • Have the same name as the class, e.g. myclass().

  • Finish by calling the constructor, and the validator, if it exists.

  • Create carefully crafted error messages tailored towards an end-user.

  • Have a thoughtfully crafted user interface with carefully chosen default values and useful conversions.

The last bullet is the trickiest, and it’s hard to give general advice. However, there are three common patterns:

  • Sometimes all the helper needs to do is coerce its inputs to the desired type. For example, new_difftime() is very strict, and violates the usual convention that you can use an integer vector wherever you can use a double vector:

    new_difftime(1:10)
    #> Error in new_difftime(1:10): is.double(x) is not TRUE

    It’s not the job of the constructor to be flexible, so here we create a helper that just coerces the input to a double.

    difftime <- function(x = double(), units = "secs") {
      x <- as.double(x)
      new_difftime(x, units = units)
    }
    
    difftime(1:10)
    #> Time differences in secs
    #>  [1]  1  2  3  4  5  6  7  8  9 10
  • Often, the most natural representation of a complex object is a string. For example, it’s very convenient to specify factors with a character vector. The code below shows a simple version of factor(): it takes a character vector, and guesses that the levels should be the unique values. This is not always correct (since some levels might not be seen in the data), but it’s a useful default.

    factor <- function(x = character(), levels = unique(x)) {
      ind <- match(x, levels)
      validate_factor(new_factor(ind, levels))
    }
    
    factor(c("a", "a", "b"))
    #> [1] a a b
    #> Levels: a b
  • Some complex objects are most naturally specified by multiple simple
    components. For example, I think it’s natural to construct a date-time by supplying the individual components (year, month, day etc). That leads me to this POSIXct() helper that resembles the existing ISODatetime() function47:

    POSIXct <- function(year = integer(), 
                        month = integer(), 
                        day = integer(), 
                        hour = 0L, 
                        minute = 0L, 
                        sec = 0, 
                        tzone = "") {
      ISOdatetime(year, month, day, hour, minute, sec, tz = tzone)
    }
    
    POSIXct(2020, 1, 1, tzone = "America/New_York")
    #> [1] "2020-01-01 EST"

For more complicated classes, you should feel free to go beyond these patterns to make life as easy as possible for your users.

13.3.4 Exercises

  1. Write a constructor for data.frame objects. What base type is a data frame built on? What attributes does it use? What are the restrictions placed on the individual elements? What about the names?

  2. Enhance my factor() helper to have better behaviour when one or more values is not found in levels. What does base::factor() do in this situation?

  3. Carefully read the source code of factor(). What does it do that my constructor does not?

  4. Factors have an optional “contrasts” attribute. Read the help for C(), and briefly describe the purpose of the attribute. What type should it have? Rewrite the new_factor() constructor to include this attribute.

  5. Read the documentation for utils::as.roman(). How would you write a constructor for this class? Does it need a validator? What might a helper do?


  1. Recent versions of R have .Date(), .difftime(), .POSIXct(), and .POSIXlt() constructors but they are internal, not well documented, and do not follow the principles that I recommend.↩︎

  2. This helper is not efficient: behind the scenes ISODatetime() works by pasting the components into a string and then using strptime(). A more efficient equivalent is available in lubridate::make_datetime().↩︎