9 Error messages

An error message should start with a general statement of the problem then give a concise description of what went wrong. Consistent use of punctuation and formatting makes errors easier to parse.

(This guide is currently almost entirely aspirational; most of the bad examples come from existing tidyverse code.)

9.1 Problem statement

Every error message should start with a general statement of the problem. It should be concise, but informative. (This is hard!)

It is encouraged to be as informative as possible, but each sentence should be very simple to make localisation and translation possible. A Localization Horror Story: It Could Happen To You is a Good summary of the challenges of localising error messages. You might not support localised messages right now but you should make it as easy as possible to do it in the future.

Ideally each sentence should contain a single phrase, and should only mention one variable quantity. Because we avoid complex sentences, we prefer to lay out information in a bullet list. Start with a list of contextual information, and finish with a list of information about faulty user input. These lists should be prefixed with ℹ and ✖ respectively if UTF-8 is available (and in blue and red if colour is available), or the ASCII * character otherwise.

If the cause of the problem is clear, use “must”:

dplyr::nth(1:10, "x")
#> Error: `n` must be a numeric vector:
#> ✖ You've supplied a character vector.

dplyr::nth(1:10, 1:2)
#> Error: `n` must have length 1
#> ✖ You've supplied a vector of length 2.

Clear cut causes typically involve incorrect types or lengths.

If you cannot state what was expected, use “can’t”:

mtcars %>% pull(b)
#> Error: Can't find column `b` in `.data`.

as_vector(environment())
#> Error: Can't coerce `.x` to a vector.

purrr::modify_depth(list(list(x = 1)), 3, ~ . + 1)
#> Error: Can't find specified `.depth` in `.x`.

The problem statement should use sentence case and end with a full stop.

Use stop(call. = FALSE), rlang::abort(), Rf_errorcall(R_NilValue, ...) to avoid cluttering the error message with the name of the function that generated it. That information is often not informative, and can easily be accessed via traceback() or an IDE equivalent.

Use simple sentences layed out in a bullet list of ℹ and ✖ elements:

The sentences should be short and bulleted:

# Good
vec_slice(letters, 100)
#> Must index an existing element:
#> ℹ There are 26 elements.
#> ✖ You've tried to subset element 100.

# Bad
vec_slice(letters, 100)
#> Must index an existing element.
#> There are 26 elements and you've tried to subset element 100.

The contextual information should be first:

# Good
vec_slice(letters, 100)
#> Must index an existing element:
#> ℹ There are 26 elements.
#> ✖ You've tried to subset element 100.

# Bad
vec_slice(letters, 100)
#> Must index an existing element:
#> ✖ You've tried to subset element 100.
#> ℹ There are 26 elements.

9.2 Error location

Do your best to reveal the location, name, and/or content of the troublesome component. The goal is to make it as easy as possible for the user to find and fix the problem.

# Good
map_int(1:5, ~ "x")
#> Error: Each result must be a single integer:
#> ✖ Result 1 is a character vector.

# Bad
map_int(1:5, ~ "x")
#> Error: Each result must be a single integer

(It is often not easy to identify the exact problem; it may require passing around extra arguments so that error messages generated at a lower-level can know the original source. For frequently used functions, the effort is typically worth it.)

If the source of the error is unclear, avoid pointing the user in the wrong direction by giving an opinion about the source of the error:

# Good
pull(mtcars, b)
#> Error: Can't find column `b` in `.data`.

tibble(x = 1:2, y = 1:3, z = 1)
#> Error: Columns must have consistent lengths: 
#> ✖ Column `x` has length 2
#> ✖ Column `y` has length 3

# Bad: implies one argument at fault
pull(mtcars, b)
#> Error: Column `b` must exist in `.data`

pull(mtcars, b)
#> Error: `.data` must contain column `b`

tibble(x = 1:2, y = 1:3, z = 1)
#> Error: Column `x` must be length 1 or 3, not 2

If there are multiple issues, or an inconsistency revealed across several arguments or items, prefer a bulleted list:

# Good
purrr::reduce2(1:4, 1:2, `+`)
#> Error: `.x` and `.y` must have compatible lengths:
#> ✖ `.x` has length 4
#> ✖ `.y` has length 2

# Bad: harder to scan
purrr::reduce2(1:4, 1:2, `+`)
#> Error: `.x` and `.y` must have compatible lengths: `.x` has length 4 and 
#> `.y` has length 2

If the list of issues might be long, make sure to truncate to only show the first few:

# Good
#> Error: NAs found at 1,000,000 locations: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...

If you want to correctly pluralise the error message, consider using ngettext(). See the notes in ?ngettext() for some challenges related to correct translation to other languages.

9.3 Hints

If the source of the error is clear and common, you may want to provide a hint as to how to fix it. If UTF-8 is available, prefix with ℹ (in blue if colour is available):

dplyr::filter(iris, Species = "setosa")
#> Error: Filter specifications must be named.
#> ℹ Did you mean `Species == "setosa"`?

ggplot2::ggplot(ggplot2::aes())
#> Error: Can't plot data with class "uneval". 
#> ℹ Did you accidentally provide the results of aes() to the `data` argument?

Hints should always end in a question mark.

Hints are particularly important if the source of the error is far away from the root cause:

# Bad
mean[[1]]
#> Error in mean[[1]] : object of type 'closure' is not subsettable

# BETTER
mean[[1]]
#> Error: Can't subset a function.

# BEST
mean[[1]]
#> Error: Can't subset a function.
#> ℹ Have you forgotten to define a variable named `mean`?

Good hints are difficult to write because, as above, you want to avoid steering users in the wrong direction. Generally, I avoid writing a hint unless the problem is common, and you can easily find a common pattern of incorrect usage (e.g. by searching StackOverflow).

9.4 Punctuation

Errors should be written in sentence case, and should end in a full stop. Bullets should be formatted similarly; make sure to capitalise the first word (unless it’s an argument or column name).

Prefer the singular in problem statements:

# Good
map_int(1:2, ~ "a")
#> Error: Each result must be coercible to a single integer:
#> ✖ Result 1 is a character vector.

# Bad
map_int(1:2, ~ "a")
#> Error: Results must be coercible to single integers: 
#> ✖ Result 1 is a character vector

If you can detect multiple problems, list up to five. This allows the user to fix multiple problems in a single pass without being overwhelmed by many errors that may have the same source.

# BETTER
map_int(1:10, ~ "a")
#> Error: Each result must be coercible to a single integer:
#> ✖ Result 1 is a character vector
#> ✖ Result 2 is a character vector
#> ✖ Result 3 is a character vector
#> ✖ Result 4 is a character vector
#> ✖ Result 5 is a character vector
#> ... and 5 more problems

Pick a natural connector between problem statement and error location: this may be “, not”, “;”, or “:” depending on the context.
Surround the names of arguments in backticks, e.g. `x`. Use “column” to disambiguate columns and arguments: Column `x`. Avoid “variable”, because it is ambiguous.
Ideally, each component of the error message should be less than 80 characters wide. Do not add manual line breaks to long error messages; they will not look correct if the console is narrower (or much wider) than expected. Instead, use bullets to break up the error into shorter logical components.

9.5 Before and after

More examples gathered from around the tidyverse.

dplyr::filter(mtcars, cyl)
#> BEFORE: Argument 2 filter condition does not evaluate to a logical vector 
#> AFTER:  Each argument must be a logical vector:
#> * Argument 2 (`cyl`) is an integer vector.

tibble::tribble("x", "y")
#> BEFORE: Expected at least one column name; e.g. `~name` 
#> AFTER:  Must supply at least one column name, e.g. `~name`.

ggplot2::ggplot(data = diamonds) + ggplot2::geom_line(ggplot2::aes(x = cut))
#> BEFORE: geom_line requires the following missing aesthetics: y
#> AFTER:  `geom_line()` must have the following aesthetics: `y`.

dplyr::rename(mtcars, cyl = xxx)
#> BEFORE: `xxx` contains unknown variables
#> AFTER:  Can't find column `xxx` in `.data`.

dplyr::arrange(mtcars, xxx)
#> BEFORE: Evaluation error: object 'xxx' not found.
#> AFTER:  Can't find column `xxx` in `.data`.