5.6 Statistical summaries
geom_bin2d() use a familiar geom,
geom_raster(), combined with a new statistical transformation,
stat_bin2d() combine the data into bins and count the number of observations in each bin. But what if we want a summary other than count? So far, we’ve just used the default statistical transformation associated with each geom. Now we’re going to explore how to use
stat_summary_2d() to compute different summaries.
Let’s start with a couple of examples with the diamonds data. The first example in each pair shows how we can count the number of diamonds in each bin; the second shows how we can compute the average price.
ggplot(diamonds, aes(color)) + geom_bar() ggplot(diamonds, aes(color, price)) + geom_bar(stat = "summary_bin", fun = mean)
ggplot(diamonds, aes(table, depth)) + geom_bin2d(binwidth = 1, na.rm = TRUE) + xlim(50, 70) + ylim(50, 70) ggplot(diamonds, aes(table, depth, z = price)) + geom_raster(binwidth = 1, stat = "summary_2d", fun = mean, na.rm = TRUE) + xlim(50, 70) + ylim(50, 70) #> Warning: Raster pixels are placed at uneven horizontal intervals and will be #> shifted. Consider using geom_tile() instead. #> Warning: Raster pixels are placed at uneven vertical intervals and will be #> shifted. Consider using geom_tile() instead.
To get more help on the arguments associated with the two transformations, look at the help for
stat_summary_2d(). You can control the size of the bins and the summary functions.
stat_summary_bin() can produce
ymax aesthetics, also making it useful for displaying measures of spread. See the docs for more details. You’ll learn more about how geoms and stats interact in Section 13.6.
These summary functions are quite constrained but are often useful for a quick first pass at a problem. If you find them restraining, you’ll need to do the summaries yourself (see R for Data Science https://r4ds.had.co.nz for details)