9.7 Base functionals
To finish up the chapter, here I provide a survey of important base functionals that are not members of the map, reduce, or predicate families, and hence have no equivalent in purrr. This is not to say that they’re not important, but they have more of a mathematical or statistical flavour, and they are generally less useful in data analysis.
9.7.1 Matrices and arrays
map()
and friends are specialised to work with one-dimensional vectors. base::apply()
is specialised to work with two-dimensional and higher vectors, i.e. matrices and arrays. You can think of apply()
as an operation that summarises a matrix or array by collapsing each row or column to a single value. It has four arguments:
X
, the matrix or array to summarise.MARGIN
, an integer vector giving the dimensions to summarise over, 1 = rows, 2 = columns, etc. (The argument name comes from thinking about the margins of a joint distribution.)FUN
, a summary function....
other arguments passed on toFUN
.
A typical example of apply()
looks like this
matrix(1:20, nrow = 5)
a2d <-apply(a2d, 1, mean)
#> [1] 8.5 9.5 10.5 11.5 12.5
apply(a2d, 2, mean)
#> [1] 3 8 13 18
You can specify multiple dimensions to MARGIN
, which is useful for high-dimensional arrays:
array(1:24, c(2, 3, 4))
a3d <-apply(a3d, 1, mean)
#> [1] 12 13
apply(a3d, c(1, 2), mean)
#> [,1] [,2] [,3]
#> [1,] 10 12 14
#> [2,] 11 13 15
There are two caveats to using apply()
:
Like
base::sapply()
, you have no control over the output type; it will automatically be simplified to a list, matrix, or vector. However, you usually useapply()
with numeric arrays and a numeric summary function so you are less likely to encounter a problem than withsapply()
.apply()
is also not idempotent in the sense that if the summary function is the identity operator, the output is not always the same as the input.apply(a2d, 1, identity) a1 <-identical(a2d, a1) #> [1] FALSE apply(a2d, 2, identity) a2 <-identical(a2d, a2) #> [1] TRUE
Never use
apply()
with a data frame. It always coerces it to a matrix, which will lead to undesirable results if your data frame contains anything other than numbers.data.frame(x = 1:3, y = c("a", "b", "c")) df <-apply(df, 2, mean) #> Warning in mean.default(newX[, i], ...): argument is not numeric or logical: #> returning NA #> Warning in mean.default(newX[, i], ...): argument is not numeric or logical: #> returning NA #> x y #> NA NA
9.7.2 Mathematical concerns
Functionals are very common in mathematics. The limit, the maximum, the roots (the set of points where f(x) = 0
), and the definite integral are all functionals: given a function, they return a single number (or vector of numbers). At first glance, these functions don’t seem to fit in with the theme of eliminating loops, but if you dig deeper you’ll find out that they are all implemented using an algorithm that involves iteration.
Base R provides a useful set:
integrate()
finds the area under the curve defined byf()
uniroot()
finds wheref()
hits zerooptimise()
finds the location of the lowest (or highest) value off()
The following example shows how functionals might be used with a simple function, sin()
:
integrate(sin, 0, pi)
#> 2 with absolute error < 2.2e-14
str(uniroot(sin, pi * c(1 / 2, 3 / 2)))
#> List of 5
#> $ root : num 3.14
#> $ f.root : num 1.22e-16
#> $ iter : int 2
#> $ init.it : int NA
#> $ estim.prec: num 6.1e-05
str(optimise(sin, c(0, 2 * pi)))
#> List of 2
#> $ minimum : num 4.71
#> $ objective: num -1
str(optimise(sin, c(0, pi), maximum = TRUE))
#> List of 2
#> $ maximum : num 1.57
#> $ objective: num 1
9.7.3 Exercises
How does
apply()
arrange the output? Read the documentation and perform some experiments.What do
eapply()
andrapply()
do? Does purrr have equivalents?Challenge: read about the fixed point algorithm. Complete the exercises using R.