Introduction

The layered structure of ggplot2 encourages you to design and construct graphics in a structured manner. You’ve learned the basics in the previous chapter, and in this chapter you’ll get a more comprehensive task-based introduction. The goal here is not to exhaustively explore every option of every geom, but instead to show the most important tools for a given task. For more information about individual geoms, along with many more examples illustrating their use, see the documentation.

It is useful to think about the purpose of each layer before it is added. In general, there are three purposes for a layer:

  • To display the data. We plot the raw data for many reasons, relying on our skills at pattern detection to spot gross structure, local structure, and outliers. This layer appears on virtually every graphic. In the earliest stages of data exploration, it is often the only layer.

  • To display a statistical summary of the data. As we develop and explore models of the data, it is useful to display model predictions in the context of the data. Showing the data helps us improve the model, and showing the model helps reveal subtleties of the data that we might otherwise miss. Summaries are usually drawn on top of the data.

  • To add additional metadata: context, annotations, and references. A metadata layer displays background context, annotations that help to give meaning to the raw data, or fixed references that aid comparisons across panels. Metadata can be useful in the background and foreground.

    A map is often used as a background layer with spatial data. Background metadata should be rendered so that it doesn’t interfere with your perception of the data, so is usually displayed underneath the data and formatted so that it is minimally perceptible. That is, if you concentrate on it, you can see it with ease, but it doesn’t jump out at you when you are casually browsing the plot.

    Other metadata is used to highlight important features of the data. If you have added explanatory labels to a couple of inflection points or outliers, then you want to render them so that they pop out at the viewer. In that case, you want this to be the very last layer drawn.

This chapter is broken up into the following sections, each of which deals with a particular graphical challenge. This is not an exhaustive or exclusive categorisation, and there are many other possible ways to break up graphics into different categories. Each geom can be used for many different purposes, especially if you are creative. However, this breakdown should cover many common tasks and help you learn about some of the possibilities.

  • Basic plot types that produce common, ‘named’ graphics like scatterplots and line charts, Section 3.1.

  • Displaying text, Section 7.2.

  • Adding arbitrary additional anotations, Chapter 7.

  • Surface plots to display 3d surfaces in 2d, Section 5.7.

  • Drawing maps, Section 6.

  • Revealing uncertainty and error, with various 1d and 2d intervals, Section 5.1.

  • Weighted data, Section 5.2.

  • In Section 5.3, you’ll learn about the diamonds dataset.

The final three sections use this data to discuss techniques for visualising larger datasets:

  • Displaying distributions, continuous and discrete, 1d and 2d, joint and conditional, Section 5.4.

  • Dealing with overplotting in scatterplots, a challenge with large datasets,
    Section 5.5.

  • Displaying statistical summaries instead of the raw data, Section 5.6.