25.6 Case studies

The following case studies illustrate some real life uses of C++ to replace slow R code.

25.6.1 Gibbs sampler

The following case study updates an example blogged about by Dirk Eddelbuettel, illustrating the conversion of a Gibbs sampler in R to C++. The R and C++ code shown below is very similar (it only took a few minutes to convert the R version to the C++ version), but runs about 20 times faster on my computer. Dirk’s blog post also shows another way to make it even faster: using the faster random number generator functions in GSL (easily accessible from R through the RcppGSL package) can make it another two to three times faster.

The R code is as follows:

gibbs_r <- function(N, thin) {
mat <- matrix(nrow = N, ncol = 2)
x <- y <- 0

for (i in 1:N) {
for (j in 1:thin) {
x <- rgamma(1, 3, y * y + 4)
y <- rnorm(1, 1 / (x + 1), 1 / sqrt(2 * (x + 1)))
}
mat[i, ] <- c(x, y)
}
mat
}

This is straightforward to convert to C++. We:

• Add type declarations to all variables.

• Use ( instead of [ to index into the matrix.

• Subscript the results of rgamma and rnorm to convert from a vector into a scalar.

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericMatrix gibbs_cpp(int N, int thin) {
NumericMatrix mat(N, 2);
double x = 0, y = 0;

for(int i = 0; i < N; i++) {
for(int j = 0; j < thin; j++) {
x = rgamma(1, 3, 1 / (y * y + 4))[0];
y = rnorm(1, 1 / (x + 1), 1 / sqrt(2 * (x + 1)))[0];
}
mat(i, 0) = x;
mat(i, 1) = y;
}

return(mat);
}

Benchmarking the two implementations yields:

bench::mark(
gibbs_r(100, 10),
gibbs_cpp(100, 10),
check = FALSE
)
#> # A tibble: 2 x 6
#>   expression              min   median itr/sec mem_alloc gc/sec
#>   <bch:expr>         <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 gibbs_r(100, 10)     4.46ms   4.59ms      213.    4.97MB     43.1
#> 2 gibbs_cpp(100, 10)  188.2µs 207.57µs     4732.     4.1KB     30.3

25.6.2 R vectorisation versus C++ vectorisation

This example is adapted from “Rcpp is smoking fast for agent-based models in data frames”. The challenge is to predict a model response from three inputs. The basic R version of the predictor looks like:

vacc1a <- function(age, female, ily) {
p <- 0.25 + 0.3 * 1 / (1 - exp(0.04 * age)) + 0.1 * ily
p <- p * if (female) 1.25 else 0.75
p <- max(0, p)
p <- min(1, p)
p
}

We want to be able to apply this function to many inputs, so we might write a vector-input version using a for loop.

vacc1 <- function(age, female, ily) {
n <- length(age)
out <- numeric(n)
for (i in seq_len(n)) {
out[i] <- vacc1a(age[i], female[i], ily[i])
}
out
}

If you’re familiar with R, you’ll have a gut feeling that this will be slow, and indeed it is. There are two ways we could attack this problem. If you have a good R vocabulary, you might immediately see how to vectorise the function (using ifelse(), pmin(), and pmax()). Alternatively, we could rewrite vacc1a() and vacc1() in C++, using our knowledge that loops and function calls have much lower overhead in C++.

Either approach is fairly straightforward. In R:

vacc2 <- function(age, female, ily) {
p <- 0.25 + 0.3 * 1 / (1 - exp(0.04 * age)) + 0.1 * ily
p <- p * ifelse(female, 1.25, 0.75)
p <- pmax(0, p)
p <- pmin(1, p)
p
}

(If you’ve worked R a lot you might recognise some potential bottlenecks in this code: ifelse, pmin, and pmax are known to be slow, and could be replaced with p * 0.75 + p * 0.5 * female, p[p < 0] <- 0, p[p > 1] <- 1. You might want to try timing those variations.)

Or in C++:

#include <Rcpp.h>
using namespace Rcpp;

double vacc3a(double age, bool female, bool ily){
double p = 0.25 + 0.3 * 1 / (1 - exp(0.04 * age)) + 0.1 * ily;
p = p * (female ? 1.25 : 0.75);
p = std::max(p, 0.0);
p = std::min(p, 1.0);
return p;
}

// [[Rcpp::export]]
NumericVector vacc3(NumericVector age, LogicalVector female,
LogicalVector ily) {
int n = age.size();
NumericVector out(n);

for(int i = 0; i < n; ++i) {
out[i] = vacc3a(age[i], female[i], ily[i]);
}

return out;
}

We next generate some sample data, and check that all three versions return the same values:

n <- 1000
age <- rnorm(n, mean = 50, sd = 10)
female <- sample(c(T, F), n, rep = TRUE)
ily <- sample(c(T, F), n, prob = c(0.8, 0.2), rep = TRUE)

stopifnot(
all.equal(vacc1(age, female, ily), vacc2(age, female, ily)),
all.equal(vacc1(age, female, ily), vacc3(age, female, ily))
)

The original blog post forgot to do this, and introduced a bug in the C++ version: it used 0.004 instead of 0.04. Finally, we can benchmark our three approaches:

bench::mark(
vacc1 = vacc1(age, female, ily),
vacc2 = vacc2(age, female, ily),
vacc3 = vacc3(age, female, ily)
)
#> # A tibble: 3 x 6
#>   expression      min   median itr/sec mem_alloc gc/sec
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vacc1         1.7ms   1.76ms      567.    7.86KB     28.8
#> 2 vacc2        50.6µs  54.16µs    17695.  148.84KB     31.3
#> 3 vacc3        15.6µs  16.12µs    60966.   14.48KB     12.2

Not surprisingly, our original approach with loops is very slow. Vectorising in R gives a huge speedup, and we can eke out even more performance (about ten times) with the C++ loop. I was a little surprised that the C++ was so much faster, but it is because the R version has to create 11 vectors to store intermediate results, where the C++ code only needs to create 1.