13.2 Basics

An S3 object is a base type with at least a class attribute (other attributes may be used to store other data). For example, take the factor. Its base type is the integer vector, it has a class attribute of “factor”, and a levels attribute that stores the possible levels:

f <- factor(c("a", "b", "c"))

typeof(f)
#> [1] "integer"
attributes(f)
#> $levels
#> [1] "a" "b" "c"
#> 
#> $class
#> [1] "factor"

You can get the underlying base type by unclass()ing it, which strips the class attribute, causing it to lose its special behaviour:

unclass(f)
#> [1] 1 2 3
#> attr(,"levels")
#> [1] "a" "b" "c"

An S3 object behaves differently from its underlying base type whenever it’s passed to a generic (short for generic function). The easiest way to tell if a function is a generic is to use sloop::ftype() and look for “generic” in the output:

ftype(print)
#> [1] "S3"      "generic"
ftype(str)
#> [1] "S3"      "generic"
ftype(unclass)
#> [1] "primitive"

A generic function defines an interface, which uses a different implementation depending on the class of an argument (almost always the first argument). Many base R functions are generic, including the important print():

print(f)
#> [1] a b c
#> Levels: a b c

# stripping class reverts to integer behaviour
print(unclass(f))
#> [1] 1 2 3
#> attr(,"levels")
#> [1] "a" "b" "c"

Beware that str() is generic, and some S3 classes use that generic to hide the internal details. For example, the POSIXlt class used to represent date-time data is actually built on top of a list, a fact which is hidden by its str() method:

time <- strptime(c("2017-01-01", "2020-05-04 03:21"), "%Y-%m-%d")
str(time)
#>  POSIXlt[1:2], format: "2017-01-01" "2020-05-04"

str(unclass(time))
#> List of 11
#>  $ sec   : num [1:2] 0 0
#>  $ min   : int [1:2] 0 0
#>  $ hour  : int [1:2] 0 0
#>  $ mday  : int [1:2] 1 4
#>  $ mon   : int [1:2] 0 4
#>  $ year  : int [1:2] 117 120
#>  $ wday  : int [1:2] 0 1
#>  $ yday  : int [1:2] 0 124
#>  $ isdst : int [1:2] 0 0
#>  $ zone  : chr [1:2] "UTC" "UTC"
#>  $ gmtoff: int [1:2] NA NA

The generic is a middleman: its job is to define the interface (i.e. the arguments) then find the right implementation for the job. The implementation for a specific class is called a method, and the generic finds that method by performing method dispatch.

You can use sloop::s3_dispatch() to see the process of method dispatch:

s3_dispatch(print(f))
#> => print.factor
#>  * print.default

We’ll come back to the details of dispatch in Section 13.4.1, for now note that S3 methods are functions with a special naming scheme, generic.class(). For example, the factor method for the print() generic is called print.factor(). You should never call the method directly, but instead rely on the generic to find it for you.

Generally, you can identify a method by the presence of . in the function name, but there are a number of important functions in base R that were written before S3, and hence use . to join words. If you’re unsure, check with sloop::ftype():

ftype(t.test)
#> [1] "S3"      "generic"
ftype(t.data.frame)
#> [1] "S3"     "method"

Unlike most functions, you can’t see the source code for most S3 methods45 just by typing their names. That’s because S3 methods are not usually exported: they live only inside the package, and are not available from the global environment. Instead, you can use sloop::s3_get_method(), which will work regardless of where the method lives:

weighted.mean.Date
#> Error in eval(expr, envir, enclos): object 'weighted.mean.Date' not found

s3_get_method(weighted.mean.Date)
#> function (x, w, ...) 
#> .Date(weighted.mean(unclass(x), w, ...))
#> <bytecode: 0x55c11ac6f808>
#> <environment: namespace:stats>

13.2.1 Exercises

  1. Describe the difference between t.test() and t.data.frame(). When is each function called?

  2. Make a list of commonly used base R functions that contain . in their name but are not S3 methods.

  3. What does the as.data.frame.data.frame() method do? Why is it confusing? How could you avoid this confusion in your own code?

  4. Describe the difference in behaviour in these two calls.

    set.seed(1014)
    some_days <- as.Date("2017-01-31") + sample(10, 5)
    
    mean(some_days)
    #> [1] "2017-02-06"
    mean(unclass(some_days))
    #> [1] 17203
  5. What class of object does the following code return? What base type is it built on? What attributes does it use?

    x <- ecdf(rpois(100, 10))
    x
    #> Empirical CDF 
    #> Call: ecdf(rpois(100, 10))
    #>  x[1:18] =  2,  3,  4,  ..., 2e+01, 2e+01
  6. What class of object does the following code return? What base type is it built on? What attributes does it use?

    x <- table(rpois(100, 5))
    x
    #> 
    #>  1  2  3  4  5  6  7  8  9 10 
    #>  7  5 18 14 15 15 14  4  5  3

  1. The exceptions are methods found in the base package, like t.data.frame, and methods that you’ve created.↩︎