20.2 New stats
While most people tend to think of geoms as the main graphic layer to add to plots, the variety of geoms is mostly powered by different stats. It follows that extending stats is one of the most useful ways to extend the capabilities of ggplot2. One of the benefits of stats is that they are purely about data transformations, which most R users are used to be doing. As long as the needed behaviour can be encapsulated in a stat, there is no need to fiddle with any calls to grid.
As discussed in the ggplot2 internals chapter, the main logic of a stat is encapsulated in a tiered succession of calls:
compute_group(). The default behaviour of
compute_layer() is to split the data by the
PANEL column, call
compute_panel(), and reassemble the results. Likewise, the default behaviour of
compute_panel() is to split the panel data by the
group column, call
compute_group(), and reassemble the results. Thus, it is only necessary to define
compute_group(), i.e. how a single group should be transformed, in order to have a working stat. There are numerous examples of overwriting
compute_panel() to gain better performance as it allows you to vectorise the computations and avoid an expensive split-combine step, but in general it is often beneficial to start at the
compute_group() level and see if the performance is adequate.
Outside of the
compute_*() functions, the remaining logic is found in the
setup_data() functions. These are called before the
compute_*() functions and allows the Stat to react and modify itself in response to the given parameters and data (especially the data, as this is not available when the stat is constructed). The
setup_params() function receives the parameters given during construction along with the layer data, and returns a modified list of parameters. The parameters should correspond to argument names in the
compute_*() functions in order to be made available. The
setup_data() function receives the modified parameters along with the layer data, and returns the modified layer data. It is important that no matter what modifications happen in
group columns remain intact. Sometimes, with related stats, all that is necessary is to make a subclass and provide new
When creating new stats it is often a good idea to provide an accompagnying
geom_*() constructer as most users are used to using these rather that
stat_*() constructors. Deviations from this rule can be made if there is no obvious default geom for the new stat, or if the stat is intended to offer a slight modification to an existing geom+stat pair.