3.2 Atomic vectors
There are four primary types of atomic vectors: logical, integer, double, and character (which contains strings). Collectively integer and double vectors are known as numeric vectors12. There are two rare types: complex and raw. I won’t discuss them further because complex numbers are rarely needed in statistics, and raw vectors are a special type that’s only needed when handling binary data.
Each of the four primary types has a special syntax to create an individual value, AKA a scalar13:
Logicals can be written in full (
FALSE), or abbreviated (
Doubles can be specified in decimal (
0.1234), scientific (
1.23e4), or hexadecimal (
0xcafe) form. There are three special values unique to doubles:
NaN(not a number). These are special values defined by the floating point standard.
Integers are written similarly to doubles but must be followed by
0xcafeL), and can not contain fractional values.
Strings are surrounded by
'bye'). Special characters are escaped with
?Quotesfor full details.
3.2.2 Making longer vectors with
To create longer vectors from shorter ones, use
c(), short for combine:
c(TRUE, FALSE) lgl_var <- c(1L, 6L, 10L) int_var <- c(1, 2.5, 4.5) dbl_var <- c("these are", "some strings")chr_var <-
When the inputs are atomic vectors,
c() always creates another atomic vector; i.e. it flattens:
c(c(1, 2), c(3, 4)) #>  1 2 3 4
In diagrams, I’ll depict vectors as connected rectangles, so the above code could be drawn as follows:
You can determine the type of a vector with
typeof()15 and its length with
typeof(lgl_var) #>  "logical" typeof(int_var) #>  "integer" typeof(dbl_var) #>  "double" typeof(chr_var) #>  "character"
3.2.3 Missing values
R represents missing, or unknown values, with special sentinel value:
NA (short for not applicable). Missing values tend to be infectious: most computations involving a missing value will return another missing value.
NA > 5 #>  NA 10 * NA #>  NA !NA #>  NA
There are only a few exceptions to this rule. These occur when some identity holds for all possible inputs:
NA ^ 0 #>  1 NA | TRUE #>  TRUE NA & FALSE #>  FALSE
Propagation of missingness leads to a common mistake when determining which values in a vector are missing:
c(NA, 5, NA, 10) x <-== NA x #>  NA NA NA NA
This result is correct (if a little surprising) because there’s no reason to believe that one missing value has the same value as another. Instead, use
is.na() to test for the presence of missingness:
is.na(x) #>  TRUE FALSE TRUE FALSE
NB: Technically there are four missing values, one for each of the atomic types:
NA_real_ (double), and
NA_character_ (character). This distinction is usually unimportant because
NA will be automatically coerced to the correct type when needed.
3.2.4 Testing and coercion
Generally, you can test if a vector is of a given type with an
is.*() function, but these functions need to be used with care.
is.character() do what you might expect: they test if a vector is a character, double, integer, or logical. Avoid
is.numeric(): they don’t test if you have a vector, atomic vector, or numeric vector; you’ll need to carefully read the documentation to figure out what they actually do.
For atomic vectors, type is a property of the entire vector: all elements must be the same type. When you attempt to combine different types they will be coerced in a fixed order: character → double → integer → logical. For example, combining a character and an integer yields a character:
str(c("a", 1)) #> chr [1:2] "a" "1"
Coercion often happens automatically. Most mathematical functions (
abs, etc.) will coerce to numeric. This coercion is particularly useful for logical vectors because
TRUE becomes 1 and
FALSE becomes 0.
c(FALSE, FALSE, TRUE) x <-as.numeric(x) #>  0 0 1 # Total number of TRUEs sum(x) #>  1 # Proportion that are TRUE mean(x) #>  0.333
Generally, you can deliberately coerce by using an
as.*() function, like
as.character(). Failed coercion of strings generates a warning and a missing value:
as.integer(c("1", "1.5", "a")) #> Warning: NAs introduced by coercion #>  1 1 NA
How do you create raw and complex scalars? (See
Test your knowledge of the vector coercion rules by predicting the output of the following uses of
c(1, FALSE) c("a", 1) c(TRUE, 1L)
1 == "1"true? Why is
-1 < FALSEtrue? Why is
"one" < 2false?
Why is the default missing value,
NA, a logical vector? What’s special about logical vectors? (Hint: think about
Precisely what do
Technically, the R language does not possess scalars. Everything that looks like a scalar is actually a vector of length one. This is mostly a theoretical distinction, but it does mean that expressions like
Lis not intuitive, and you might wonder where it comes from. At the time
Lwas added to R, R’s integer type was equivalent to a long integer in C, and C code could use a suffix of
Lto force a number to be a long integer. It was decided that
lwas too visually similar to
i(used for complex numbers in R), leaving
You may have heard of the related
storage.mode()functions. Do not use them: they exist only for compatibility with S.↩︎