24.1 Introduction

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified.

— Donald Knuth

Once you’ve used profiling to identify a bottleneck, you need to make it faster. It’s difficult to provide general advice on improving performance, but I try my best with four techniques that can be applied in many situations. I’ll also suggest a general strategy for performance optimisation that helps ensure that your faster code is still correct.

It’s easy to get caught up in trying to remove all bottlenecks. Don’t! Your time is valuable and is better spent analysing your data, not eliminating possible inefficiencies in your code. Be pragmatic: don’t spend hours of your time to save seconds of computer time. To enforce this advice, you should set a goal time for your code and optimise only up to that goal. This means you will not eliminate all bottlenecks. Some you will not get to because you’ve met your goal. Others you may need to pass over and accept either because there is no quick and easy solution or because the code is already well optimised and no significant improvement is possible. Accept these possibilities and move on to the next candidate.

If you’d like to learn more about the performance characteristics of the R language, I’d highly recommend Evaluating the Design of the R Language (Morandat et al. 2012). It draws conclusions by combining a modified R interpreter with a wide set of code found in the wild.

Outline

Section 24.2 teaches you how to organise your code to make optimisation as easy, and bug free, as possible.
Section 24.3 reminds you to look for existing solutions.
Section 24.4 emphasises the importance of being lazy: often the easiest way to make a function faster is to let it to do less work.
Section 24.5 concisely defines vectorisation, and shows you how to make the most of built-in functions.
Section 24.6 discusses the performance perils of copying data.
Section 24.7 pulls all the pieces together into a case study showing how to speed up repeated t-tests by about a thousand times.
Section 24.8 finishes the chapter with pointers to more resources that will help you write fast code.

Prerequisites

We’ll use bench to precisely compare the performance of small self-contained code chunks.

library(bench)

References

Morandat, Floréal, Brandon Hill, Leo Osvald, and Jan Vitek. 2012. “Evaluating the Design of the R Language.” In European Conference on Object-Oriented Programming, 104–31. Springer. http://r.cs.purdue.edu/pub/ecoop12.pdf.