25.6 Case studies
The following case studies illustrate some real life uses of C++ to replace slow R code.
25.6.1 Gibbs sampler
The following case study updates an example blogged about by Dirk Eddelbuettel, illustrating the conversion of a Gibbs sampler in R to C++. The R and C++ code shown below is very similar (it only took a few minutes to convert the R version to the C++ version), but runs about 20 times faster on my computer. Dirk’s blog post also shows another way to make it even faster: using the faster random number generator functions in GSL (easily accessible from R through the RcppGSL package) can make it another two to three times faster.
The R code is as follows:
function(N, thin) {
gibbs_r <- matrix(nrow = N, ncol = 2)
mat <- y <- 0
x <-
for (i in 1:N) {
for (j in 1:thin) {
rgamma(1, 3, y * y + 4)
x <- rnorm(1, 1 / (x + 1), 1 / sqrt(2 * (x + 1)))
y <-
} c(x, y)
mat[i, ] <-
}
mat }
This is straightforward to convert to C++. We:
Add type declarations to all variables.
Use
(
instead of[
to index into the matrix.Subscript the results of
rgamma
andrnorm
to convert from a vector into a scalar.
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
int N, int thin) {
NumericMatrix gibbs_cpp(2);
NumericMatrix mat(N, double x = 0, y = 0;
for(int i = 0; i < N; i++) {
for(int j = 0; j < thin; j++) {
1, 3, 1 / (y * y + 4))[0];
x = rgamma(1, 1 / (x + 1), 1 / sqrt(2 * (x + 1)))[0];
y = rnorm(
}0) = x;
mat(i, 1) = y;
mat(i,
}
return(mat);
}
Benchmarking the two implementations yields:
::mark(
benchgibbs_r(100, 10),
gibbs_cpp(100, 10),
check = FALSE
)#> # A tibble: 2 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 gibbs_r(100, 10) 4.46ms 4.59ms 213. 4.97MB 43.1
#> 2 gibbs_cpp(100, 10) 188.2µs 207.57µs 4732. 4.1KB 30.3
25.6.2 R vectorisation versus C++ vectorisation
This example is adapted from “Rcpp is smoking fast for agent-based models in data frames”. The challenge is to predict a model response from three inputs. The basic R version of the predictor looks like:
function(age, female, ily) {
vacc1a <- 0.25 + 0.3 * 1 / (1 - exp(0.04 * age)) + 0.1 * ily
p <- p * if (female) 1.25 else 0.75
p <- max(0, p)
p <- min(1, p)
p <-
p }
We want to be able to apply this function to many inputs, so we might write a vector-input version using a for loop.
function(age, female, ily) {
vacc1 <- length(age)
n <- numeric(n)
out <-for (i in seq_len(n)) {
vacc1a(age[i], female[i], ily[i])
out[i] <-
}
out }
If you’re familiar with R, you’ll have a gut feeling that this will be slow, and indeed it is. There are two ways we could attack this problem. If you have a good R vocabulary, you might immediately see how to vectorise the function (using ifelse()
, pmin()
, and pmax()
). Alternatively, we could rewrite vacc1a()
and vacc1()
in C++, using our knowledge that loops and function calls have much lower overhead in C++.
Either approach is fairly straightforward. In R:
function(age, female, ily) {
vacc2 <- 0.25 + 0.3 * 1 / (1 - exp(0.04 * age)) + 0.1 * ily
p <- p * ifelse(female, 1.25, 0.75)
p <- pmax(0, p)
p <- pmin(1, p)
p <-
p }
(If you’ve worked R a lot you might recognise some potential bottlenecks in this code: ifelse
, pmin
, and pmax
are known to be slow, and could be replaced with p * 0.75 + p * 0.5 * female
, p[p < 0] <- 0
, p[p > 1] <- 1
. You might want to try timing those variations.)
Or in C++:
#include <Rcpp.h>
using namespace Rcpp;
double vacc3a(double age, bool female, bool ily){
double p = 0.25 + 0.3 * 1 / (1 - exp(0.04 * age)) + 0.1 * ily;
1.25 : 0.75);
p = p * (female ? std::max(p, 0.0);
p = std::min(p, 1.0);
p = return p;
}
// [[Rcpp::export]]
NumericVector vacc3(NumericVector age, LogicalVector female,
LogicalVector ily) {int n = age.size();
NumericVector out(n);
for(int i = 0; i < n; ++i) {
out[i] = vacc3a(age[i], female[i], ily[i]);
}
return out;
}
We next generate some sample data, and check that all three versions return the same values:
1000
n <- rnorm(n, mean = 50, sd = 10)
age <- sample(c(T, F), n, rep = TRUE)
female <- sample(c(T, F), n, prob = c(0.8, 0.2), rep = TRUE)
ily <-
stopifnot(
all.equal(vacc1(age, female, ily), vacc2(age, female, ily)),
all.equal(vacc1(age, female, ily), vacc3(age, female, ily))
)
The original blog post forgot to do this, and introduced a bug in the C++ version: it used 0.004
instead of 0.04
. Finally, we can benchmark our three approaches:
::mark(
benchvacc1 = vacc1(age, female, ily),
vacc2 = vacc2(age, female, ily),
vacc3 = vacc3(age, female, ily)
)#> # A tibble: 3 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 vacc1 1.7ms 1.76ms 567. 7.86KB 28.8
#> 2 vacc2 50.6µs 54.16µs 17695. 148.84KB 31.3
#> 3 vacc3 15.6µs 16.12µs 60966. 14.48KB 12.2
Not surprisingly, our original approach with loops is very slow. Vectorising in R gives a huge speedup, and we can eke out even more performance (about ten times) with the C++ loop. I was a little surprised that the C++ was so much faster, but it is because the R version has to create 11 vectors to store intermediate results, where the C++ code only needs to create 1.