24.3 Checking for existing solutions

Once you’ve organised your code and captured all the variations you can think of, it’s natural to see what others have done. You are part of a large community, and it’s quite possible that someone has already tackled the same problem. Two good places to start are:

  • CRAN task views. If there’s a CRAN task view related to your problem domain, it’s worth looking at the packages listed there.

  • Reverse dependencies of Rcpp, as listed on its CRAN page. Since these packages use C++, they’re likely to be fast.

Otherwise, the challenge is describing your bottleneck in a way that helps you find related problems and solutions. Knowing the name of the problem or its synonyms will make this search much easier. But because you don’t know what it’s called, it’s hard to search for it! The best way to solve this problem is to read widely so that you can build up your own vocabulary over time. Alternatively, ask others. Talk to your colleagues and brainstorm some possible names, then search on Google and StackOverflow. It’s often helpful to restrict your search to R related pages. For Google, try rseek. For stackoverflow, restrict your search by including the R tag, [R], in your search.

Record all solutions that you find, not just those that immediately appear to be faster. Some solutions might be slower initially, but end up being faster because they’re easier to optimise. You may also be able to combine the fastest parts from different approaches. If you’ve found a solution that’s fast enough, congratulations! Otherwise, read on.

24.3.1 Exercises

  1. What are faster alternatives to lm()? Which are specifically designed to work with larger datasets?

  2. What package implements a version of match() that’s faster for repeated lookups? How much faster is it?

  3. List four functions (not just those in base R) that convert a string into a date time object. What are their strengths and weaknesses?

  4. Which packages provide the ability to compute a rolling mean?

  5. What are the alternatives to optim()?