24.6 Avoiding copies
A pernicious source of slow R code is growing an object with a loop. Whenever you use c()
, append()
, cbind()
, rbind()
, or paste()
to create a bigger object, R must first allocate space for the new object and then copy the old object to its new home. If you’re repeating this many times, like in a for loop, this can be quite expensive. You’ve entered Circle 2 of the R inferno.
You saw one example of this type of problem in Section 23.2.2, so here I’ll show a slightly more complex example of the same basic issue. We first generate some random strings, and then combine them either iteratively with a loop using collapse()
, or in a single pass using paste()
. Note that the performance of collapse()
gets relatively worse as the number of strings grows: combining 100 strings takes almost 30 times longer than combining 10 strings.
function() {
random_string <-paste(sample(letters, 50, replace = TRUE), collapse = "")
} replicate(10, random_string())
strings10 <- replicate(100, random_string())
strings100 <-
function(xs) {
collapse <- ""
out <-for (x in xs) {
paste0(out, x)
out <-
}
out
}
::mark(
benchloop10 = collapse(strings10),
loop100 = collapse(strings100),
vec10 = paste(strings10, collapse = ""),
vec100 = paste(strings100, collapse = ""),
check = FALSE
c("expression", "min", "median", "itr/sec", "n_gc")]
)[#> # A tibble: 4 x 4
#> expression min median `itr/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl>
#> 1 loop10 27.95µs 30.31µs 32282.
#> 2 loop100 681.4µs 706.48µs 1410.
#> 3 vec10 5.68µs 5.93µs 163946.
#> 4 vec100 37.85µs 39.03µs 25399.
Modifying an object in a loop, e.g., x[i] <- y
, can also create a copy, depending on the class of x
. Section 2.5.1 discusses this issue in more depth and gives you some tools to determine when you’re making copies.