6.2 Function fundamentals

To understand functions in R you need to internalise two important ideas:

  • Functions can be broken down into three components: arguments, body, and environment.

There are exceptions to every rule, and in this case, there is a small selection of “primitive” base functions that are implemented purely in C.

  • Functions are objects, just as vectors are objects.

6.2.1 Function components

A function has three parts:

  • The formals(), the list of arguments that control how you call the function.

  • The body(), the code inside the function.

  • The environment(), the data structure that determines how the function finds the values associated with the names.

While the formals and body are specified explicitly when you create a function, the environment is specified implicitly, based on where you defined the function. The function environment always exists, but it is only printed when the function isn’t defined in the global environment.

f02 <- function(x, y) {
  # A comment
  x + y
}

formals(f02)
#> $x
#> 
#> 
#> $y

body(f02)
#> {
#>     x + y
#> }

environment(f02)
#> <environment: R_GlobalEnv>

I’ll draw functions as in the following diagram. The black dot on the left is the environment. The two blocks to the right are the function arguments. I won’t draw the body, because it’s usually large, and doesn’t help you understand the shape of the function.

Like all objects in R, functions can also possess any number of additional attributes(). One attribute used by base R is srcref, short for source reference. It points to the source code used to create the function. The srcref is used for printing because, unlike body(), it contains code comments and other formatting.

attr(f02, "srcref")
#> function(x, y) {
#>   # A comment
#>   x + y
#> }

6.2.2 Primitive functions

There is one exception to the rule that a function has three components. Primitive functions, like sum() and [, call C code directly.

sum
#> function (..., na.rm = FALSE)  .Primitive("sum")
`[`
#> .Primitive("[")

They have either type builtin or type special.

typeof(sum)
#> [1] "builtin"
typeof(`[`)
#> [1] "special"

These functions exist primarily in C, not R, so their formals(), body(), and environment() are all NULL:

formals(sum)
#> NULL
body(sum)
#> NULL
environment(sum)
#> NULL

Primitive functions are only found in the base package. While they have certain performance advantages, this benefit comes at a price: they are harder to write. For this reason, R-core generally avoids creating them unless there is no other option.

6.2.3 First-class functions

It’s very important to understand that R functions are objects in their own right, a language property often called “first-class functions”. Unlike in many other languages, there is no special syntax for defining and naming a function: you simply create a function object (with function) and bind it to a name with <-:

f01 <- function(x) {
  sin(1 / x ^ 2)
}

While you almost always create a function and then bind it to a name, the binding step is not compulsory. If you choose not to give a function a name, you get an anonymous function. This is useful when it’s not worth the effort to figure out a name:

lapply(mtcars, function(x) length(unique(x)))
Filter(function(x) !is.numeric(x), mtcars)
integrate(function(x) sin(x) ^ 2, 0, pi)

A final option is to put functions in a list:

funs <- list(
  half = function(x) x / 2,
  double = function(x) x * 2
)

funs$double(10)
#> [1] 20

In R, you’ll often see functions called closures. This name reflects the fact that R functions capture, or enclose, their environments, which you’ll learn more about in Section 7.4.2.

6.2.4 Invoking a function

You normally call a function by placing its arguments, wrapped in parentheses, after its name: mean(1:10, na.rm = TRUE). But what happens if you have the arguments already in a data structure?

args <- list(1:10, na.rm = TRUE)

You can instead use do.call(): it has two arguments. The function to call, and a list containing the function arguments:

do.call(mean, args)
#> [1] 5.5

We’ll come back to this idea in Section 19.6.

6.2.5 Exercises

  1. Given a name, like "mean", match.fun() lets you find a function. Given a function, can you find its name? Why doesn’t that make sense in R?

  2. It’s possible (although typically not useful) to call an anonymous function. Which of the two approaches below is correct? Why?

    function(x) 3()
    #> function(x) 3()
    (function(x) 3)()
    #> [1] 3
  3. A good rule of thumb is that an anonymous function should fit on one line and shouldn’t need to use {}. Review your code. Where could you have used an anonymous function instead of a named function? Where should you have used a named function instead of an anonymous function?

  4. What function allows you to tell if an object is a function? What function allows you to tell if a function is a primitive function?

  5. This code makes a list of all functions in the base package.

    objs <- mget(ls("package:base", all = TRUE), inherits = TRUE)
    funs <- Filter(is.function, objs)

    Use it to answer the following questions:

    1. Which base function has the most arguments?

    2. How many base functions have no arguments? What’s special about those functions?

    3. How could you adapt the code to find all primitive functions?

  6. What are the three important components of a function?

  7. When does printing a function not show the environment it was created in?