20.2 New stats

While most people tend to think of geoms as the main graphic layer to add to plots, the variety of geoms is mostly powered by different stats. It follows that extending stats is one of the most useful ways to extend the capabilities of ggplot2. One of the benefits of stats is that they are purely about data transformations, which most R users are used to be doing. As long as the needed behaviour can be encapsulated in a stat, there is no need to fiddle with any calls to grid.

As discussed in the ggplot2 internals chapter, the main logic of a stat is encapsulated in a tiered succession of calls: compute_layer(), compute_panel(), and compute_group(). The default behaviour of compute_layer() is to split the data by the PANEL column, call compute_panel(), and reassemble the results. Likewise, the default behaviour of compute_panel() is to split the panel data by the group column, call compute_group(), and reassemble the results. Thus, it is only necessary to define compute_group(), i.e. how a single group should be transformed, in order to have a working stat. There are numerous examples of overwriting compute_panel() to gain better performance as it allows you to vectorise the computations and avoid an expensive split-combine step, but in general it is often beneficial to start at the compute_group() level and see if the performance is adequate.

Outside of the compute_*() functions, the remaining logic is found in the setup_params() and setup_data() functions. These are called before the compute_*() functions and allows the Stat to react and modify itself in response to the given parameters and data (especially the data, as this is not available when the stat is constructed). The setup_params() function receives the parameters given during construction along with the layer data, and returns a modified list of parameters. The parameters should correspond to argument names in the compute_*() functions in order to be made available. The setup_data() function receives the modified parameters along with the layer data, and returns the modified layer data. It is important that no matter what modifications happen in setup_data() the PANEL and group columns remain intact. Sometimes, with related stats, all that is necessary is to make a subclass and provide new setup_params()/setup_data() methods.

When creating new stats it is often a good idea to provide an accompagnying geom_*() constructer as most users are used to using these rather that stat_*() constructors. Deviations from this rule can be made if there is no obvious default geom for the new stat, or if the stat is intended to offer a slight modification to an existing geom+stat pair.