14.1 Scale specification

An important property of ggplot2 is the principle that every aesthetic in your plot is associated with exactly one scale. For instance, when you write:

ggplot(mpg, aes(displ, hwy)) + 
  geom_point(aes(colour = class))

ggplot2 added a default scale for each aesthetic used in the plot:

ggplot(mpg, aes(displ, hwy)) + 
  geom_point(aes(colour = class)) +
  scale_x_continuous() + 
  scale_y_continuous() + 
  scale_colour_discrete()

The choice of default scale depends on the aesthetic and the variable type. In this example hwy is a continuous variable mapped to the y aesthetic so the default scale is scale_y_continuous(); similarly class is discrete so when mapped to the colour aesthetic the default scale becomes scale_colour_discrete().

Specifying these defaults would be tedious so ggplot2 does it for you. But if you want to override the defaults, you’ll need to add the scale yourself, like this:

ggplot(mpg, aes(displ, hwy)) + 
  geom_point(aes(colour = class)) + 
  scale_x_continuous("A really awesome x axis label") +
  scale_y_continuous("An amazingly great y axis label")

The use of + to “add” scales to a plot is a little misleading because if you supply two scales for the same aesthetic, the last scale takes precedence. In other words, when you + a scale, you’re not actually adding it to the plot, but overriding the existing scale. This means that the following two specifications are equivalent:

ggplot(mpg, aes(displ, hwy)) + 
  geom_point() + 
  scale_x_continuous("Label 1") +
  scale_x_continuous("Label 2")
#> Scale for 'x' is already present. Adding another scale for 'x', which will
#> replace the existing scale.

ggplot(mpg, aes(displ, hwy)) + 
  geom_point() + 
  scale_x_continuous("Label 2")

Note the message when you add multiple scales for the same aesthetic, which makes it harder to accidentally overwrite an existing scale. If you see this in your own code, you should make sure that you’re only adding one scale to each aesthetic.

If you’re making small tweaks to the scales, you might continue to use the default scales, supplying a few extra arguments. If you want to make more radical changes you will override the default scales with alternatives:

ggplot(mpg, aes(displ, hwy)) + 
  geom_point(aes(colour = class)) +
  scale_x_sqrt() + 
  scale_colour_brewer()

Here scale_x_sqrt() changes the scale for the x axis scale, and scale_colour_brewer() does the same for the colour scale.

The scale functions intended for users all follow a common naming scheme. You’ve probably already figured out the scheme, but to be concrete, it’s made up of three pieces separated by "_":

scale
The name of the primary aesthetic (e.g., colour, shape or x)
The name of the scale (e.g., continuous, discrete, brewer).

The naming structure is often helpful, but can sometimes be ambiguous. For example, it is immediately clear that scale_x_*() functions apply to the x aesthetic, but it takes a little more thought to recognise that they also govern the behaviour of other aesthetics that describe a horizontal position (e.g., the xmin, xmax, and xend aesthetics). Similarly, while the name scale_colour_continuous() clearly refers to the colour scale associated with a continuous variables, it is less obvious that scale_colour_distiller() is simply a different method for creating colour scales for continuous variables. In this chapter I will try to clarify this structure as much as possible: more generally, the help documentation for each scale function may be helpful.

Before diving into the details of how scale functions work, it is useful to note that internally all scale functions in ggplot2 belong to one of three fundamental types; continuous scales, discrete scales, and binned scales. Each fundamental type is handled by one of three scale constructor functions; continuous_scale(), discrete_scale() and binned_scale(). Although you should never need to call these constructor functions, they form the organising structure for later sections in this chapter.

14.1.1 Exercises

Simplify the following plot specifications to make them easier to understand.

ggplot(mpg, aes(displ)) + 
  scale_y_continuous("Highway mpg") + 
  scale_x_continuous() +
  geom_point(aes(y = hwy))

ggplot(mpg, aes(y = displ, x = class)) + 
  scale_y_continuous("Displacement (l)") + 
  scale_x_discrete("Car type") +
  scale_x_discrete("Type of car") + 
  scale_colour_discrete() + 
  geom_point(aes(colour = drv)) + 
  scale_colour_discrete("Drive\ntrain")

What happens if you pair a discrete variable with a continuous scale? What happens if you pair a continuous variable with a discrete scale?