3.2 Atomic vectors

There are four primary types of atomic vectors: logical, integer, double, and character (which contains strings). Collectively integer and double vectors are known as numeric vectors12. There are two rare types: complex and raw. I won’t discuss them further because complex numbers are rarely needed in statistics, and raw vectors are a special type that’s only needed when handling binary data.

3.2.1 Scalars

Each of the four primary types has a special syntax to create an individual value, AKA a scalar13:

  • Logicals can be written in full (TRUE or FALSE), or abbreviated (T or F).

  • Doubles can be specified in decimal (0.1234), scientific (1.23e4), or hexadecimal (0xcafe) form. There are three special values unique to doubles: Inf, -Inf, and NaN (not a number). These are special values defined by the floating point standard.

  • Integers are written similarly to doubles but must be followed by L14 (1234L, 1e4L, or 0xcafeL), and can not contain fractional values.

  • Strings are surrounded by " ("hi") or ' ('bye'). Special characters are escaped with \; see ?Quotes for full details.

3.2.2 Making longer vectors with c()

To create longer vectors from shorter ones, use c(), short for combine:

lgl_var <- c(TRUE, FALSE)
int_var <- c(1L, 6L, 10L)
dbl_var <- c(1, 2.5, 4.5)
chr_var <- c("these are", "some strings")

When the inputs are atomic vectors, c() always creates another atomic vector; i.e. it flattens:

c(c(1, 2), c(3, 4))
#> [1] 1 2 3 4

In diagrams, I’ll depict vectors as connected rectangles, so the above code could be drawn as follows:

You can determine the type of a vector with typeof()15 and its length with length().

typeof(lgl_var)
#> [1] "logical"
typeof(int_var)
#> [1] "integer"
typeof(dbl_var)
#> [1] "double"
typeof(chr_var)
#> [1] "character"

3.2.3 Missing values

R represents missing, or unknown values, with special sentinel value: NA (short for not applicable). Missing values tend to be infectious: most computations involving a missing value will return another missing value.

NA > 5
#> [1] NA
10 * NA
#> [1] NA
!NA
#> [1] NA

There are only a few exceptions to this rule. These occur when some identity holds for all possible inputs:

NA ^ 0
#> [1] 1
NA | TRUE
#> [1] TRUE
NA & FALSE
#> [1] FALSE

Propagation of missingness leads to a common mistake when determining which values in a vector are missing:

x <- c(NA, 5, NA, 10)
x == NA
#> [1] NA NA NA NA

This result is correct (if a little surprising) because there’s no reason to believe that one missing value has the same value as another. Instead, use is.na() to test for the presence of missingness:

is.na(x)
#> [1]  TRUE FALSE  TRUE FALSE

NB: Technically there are four missing values, one for each of the atomic types: NA (logical), NA_integer_ (integer), NA_real_ (double), and NA_character_ (character). This distinction is usually unimportant because NA will be automatically coerced to the correct type when needed.

3.2.4 Testing and coercion

Generally, you can test if a vector is of a given type with an is.*() function, but these functions need to be used with care. is.logical(), is.integer(), is.double(), and is.character() do what you might expect: they test if a vector is a character, double, integer, or logical. Avoid is.vector(), is.atomic(), and is.numeric(): they don’t test if you have a vector, atomic vector, or numeric vector; you’ll need to carefully read the documentation to figure out what they actually do.

For atomic vectors, type is a property of the entire vector: all elements must be the same type. When you attempt to combine different types they will be coerced in a fixed order: character → double → integer → logical. For example, combining a character and an integer yields a character:

str(c("a", 1))
#>  chr [1:2] "a" "1"

Coercion often happens automatically. Most mathematical functions (+, log, abs, etc.) will coerce to numeric. This coercion is particularly useful for logical vectors because TRUE becomes 1 and FALSE becomes 0.

x <- c(FALSE, FALSE, TRUE)
as.numeric(x)
#> [1] 0 0 1

# Total number of TRUEs
sum(x)
#> [1] 1

# Proportion that are TRUE
mean(x)
#> [1] 0.333

Generally, you can deliberately coerce by using an as.*() function, like as.logical(), as.integer(), as.double(), or as.character(). Failed coercion of strings generates a warning and a missing value:

as.integer(c("1", "1.5", "a"))
#> Warning: NAs introduced by coercion
#> [1]  1  1 NA

3.2.5 Exercises

  1. How do you create raw and complex scalars? (See ?raw and ?complex.)

  2. Test your knowledge of the vector coercion rules by predicting the output of the following uses of c():

    c(1, FALSE)
    c("a", 1)
    c(TRUE, 1L)
  3. Why is 1 == "1" true? Why is -1 < FALSE true? Why is "one" < 2 false?

  4. Why is the default missing value, NA, a logical vector? What’s special about logical vectors? (Hint: think about c(FALSE, NA_character_).)

  5. Precisely what do is.atomic(), is.numeric(), and is.vector() test for?


  1. This is a slight simplification as R does not use “numeric” consistently, which we’ll come back to in Section 12.3.1.↩︎

  2. Technically, the R language does not possess scalars. Everything that looks like a scalar is actually a vector of length one. This is mostly a theoretical distinction, but it does mean that expressions like 1[1] work.↩︎

  3. L is not intuitive, and you might wonder where it comes from. At the time L was added to R, R’s integer type was equivalent to a long integer in C, and C code could use a suffix of l or L to force a number to be a long integer. It was decided that l was too visually similar to i (used for complex numbers in R), leaving L.↩︎

  4. You may have heard of the related mode() and storage.mode() functions. Do not use them: they exist only for compatibility with S.↩︎