9.7 Base functionals

To finish up the chapter, here I provide a survey of important base functionals that are not members of the map, reduce, or predicate families, and hence have no equivalent in purrr. This is not to say that they’re not important, but they have more of a mathematical or statistical flavour, and they are generally less useful in data analysis.

9.7.1 Matrices and arrays

map() and friends are specialised to work with one-dimensional vectors. base::apply() is specialised to work with two-dimensional and higher vectors, i.e. matrices and arrays. You can think of apply() as an operation that summarises a matrix or array by collapsing each row or column to a single value. It has four arguments:

  • X, the matrix or array to summarise.

  • MARGIN, an integer vector giving the dimensions to summarise over, 1 = rows, 2 = columns, etc. (The argument name comes from thinking about the margins of a joint distribution.)

  • FUN, a summary function.

  • ... other arguments passed on to FUN.

A typical example of apply() looks like this

a2d <- matrix(1:20, nrow = 5)
apply(a2d, 1, mean)
#> [1]  8.5  9.5 10.5 11.5 12.5
apply(a2d, 2, mean)
#> [1]  3  8 13 18

You can specify multiple dimensions to MARGIN, which is useful for high-dimensional arrays:

a3d <- array(1:24, c(2, 3, 4))
apply(a3d, 1, mean)
#> [1] 12 13
apply(a3d, c(1, 2), mean)
#>      [,1] [,2] [,3]
#> [1,]   10   12   14
#> [2,]   11   13   15

There are two caveats to using apply():

  • Like base::sapply(), you have no control over the output type; it will automatically be simplified to a list, matrix, or vector. However, you usually use apply() with numeric arrays and a numeric summary function so you are less likely to encounter a problem than with sapply().

  • apply() is also not idempotent in the sense that if the summary function is the identity operator, the output is not always the same as the input.

    a1 <- apply(a2d, 1, identity)
    identical(a2d, a1)
    #> [1] FALSE
    a2 <- apply(a2d, 2, identity)
    identical(a2d, a2)
    #> [1] TRUE
  • Never use apply() with a data frame. It always coerces it to a matrix, which will lead to undesirable results if your data frame contains anything other than numbers.

    df <- data.frame(x = 1:3, y = c("a", "b", "c"))
    apply(df, 2, mean)
    #> Warning in mean.default(newX[, i], ...): argument is not numeric or logical:
    #> returning NA
    #> Warning in mean.default(newX[, i], ...): argument is not numeric or logical:
    #> returning NA
    #>  x  y 
    #> NA NA

9.7.2 Mathematical concerns

Functionals are very common in mathematics. The limit, the maximum, the roots (the set of points where f(x) = 0), and the definite integral are all functionals: given a function, they return a single number (or vector of numbers). At first glance, these functions don’t seem to fit in with the theme of eliminating loops, but if you dig deeper you’ll find out that they are all implemented using an algorithm that involves iteration.

Base R provides a useful set:

  • integrate() finds the area under the curve defined by f()
  • uniroot() finds where f() hits zero
  • optimise() finds the location of the lowest (or highest) value of f()

The following example shows how functionals might be used with a simple function, sin():

integrate(sin, 0, pi)
#> 2 with absolute error < 2.2e-14
str(uniroot(sin, pi * c(1 / 2, 3 / 2)))
#> List of 5
#>  $ root      : num 3.14
#>  $ f.root    : num 1.22e-16
#>  $ iter      : int 2
#>  $ init.it   : int NA
#>  $ estim.prec: num 6.1e-05
str(optimise(sin, c(0, 2 * pi)))
#> List of 2
#>  $ minimum  : num 4.71
#>  $ objective: num -1
str(optimise(sin, c(0, pi), maximum = TRUE))
#> List of 2
#>  $ maximum  : num 1.57
#>  $ objective: num 1

9.7.3 Exercises

  1. How does apply() arrange the output? Read the documentation and perform some experiments.

  2. What do eapply() and rapply() do? Does purrr have equivalents?

  3. Challenge: read about the fixed point algorithm. Complete the exercises using R.