3.2 Atomic vectors
There are four primary types of atomic vectors: logical, integer, double, and character (which contains strings). Collectively integer and double vectors are known as numeric vectors12. There are two rare types: complex and raw. I won’t discuss them further because complex numbers are rarely needed in statistics, and raw vectors are a special type that’s only needed when handling binary data.
3.2.1 Scalars
Each of the four primary types has a special syntax to create an individual value, AKA a scalar13:
Logicals can be written in full (
TRUE
orFALSE
), or abbreviated (T
orF
).Doubles can be specified in decimal (
0.1234
), scientific (1.23e4
), or hexadecimal (0xcafe
) form. There are three special values unique to doubles:Inf
,-Inf
, andNaN
(not a number). These are special values defined by the floating point standard.Integers are written similarly to doubles but must be followed by
L
14 (1234L
,1e4L
, or0xcafeL
), and can not contain fractional values.Strings are surrounded by
"
("hi"
) or'
('bye'
). Special characters are escaped with\
; see?Quotes
for full details.
3.2.2 Making longer vectors with c()
To create longer vectors from shorter ones, use c()
, short for combine:
c(TRUE, FALSE)
lgl_var <- c(1L, 6L, 10L)
int_var <- c(1, 2.5, 4.5)
dbl_var <- c("these are", "some strings") chr_var <-
When the inputs are atomic vectors, c()
always creates another atomic vector; i.e. it flattens:
c(c(1, 2), c(3, 4))
#> [1] 1 2 3 4
In diagrams, I’ll depict vectors as connected rectangles, so the above code could be drawn as follows:
You can determine the type of a vector with typeof()
15 and its length with length()
.
typeof(lgl_var)
#> [1] "logical"
typeof(int_var)
#> [1] "integer"
typeof(dbl_var)
#> [1] "double"
typeof(chr_var)
#> [1] "character"
3.2.3 Missing values
R represents missing, or unknown values, with special sentinel value: NA
(short for not applicable). Missing values tend to be infectious: most computations involving a missing value will return another missing value.
NA > 5
#> [1] NA
10 * NA
#> [1] NA
!NA
#> [1] NA
There are only a few exceptions to this rule. These occur when some identity holds for all possible inputs:
NA ^ 0
#> [1] 1
NA | TRUE
#> [1] TRUE
NA & FALSE
#> [1] FALSE
Propagation of missingness leads to a common mistake when determining which values in a vector are missing:
c(NA, 5, NA, 10)
x <-== NA
x #> [1] NA NA NA NA
This result is correct (if a little surprising) because there’s no reason to believe that one missing value has the same value as another. Instead, use is.na()
to test for the presence of missingness:
is.na(x)
#> [1] TRUE FALSE TRUE FALSE
NB: Technically there are four missing values, one for each of the atomic types: NA
(logical), NA_integer_
(integer), NA_real_
(double), and NA_character_
(character). This distinction is usually unimportant because NA
will be automatically coerced to the correct type when needed.
3.2.4 Testing and coercion
Generally, you can test if a vector is of a given type with an is.*()
function, but these functions need to be used with care. is.logical()
, is.integer()
, is.double()
, and is.character()
do what you might expect: they test if a vector is a character, double, integer, or logical. Avoid is.vector()
, is.atomic()
, and is.numeric()
: they don’t test if you have a vector, atomic vector, or numeric vector; you’ll need to carefully read the documentation to figure out what they actually do.
For atomic vectors, type is a property of the entire vector: all elements must be the same type. When you attempt to combine different types they will be coerced in a fixed order: character → double → integer → logical. For example, combining a character and an integer yields a character:
str(c("a", 1))
#> chr [1:2] "a" "1"
Coercion often happens automatically. Most mathematical functions (+
, log
, abs
, etc.) will coerce to numeric. This coercion is particularly useful for logical vectors because TRUE
becomes 1 and FALSE
becomes 0.
c(FALSE, FALSE, TRUE)
x <-as.numeric(x)
#> [1] 0 0 1
# Total number of TRUEs
sum(x)
#> [1] 1
# Proportion that are TRUE
mean(x)
#> [1] 0.333
Generally, you can deliberately coerce by using an as.*()
function, like as.logical()
, as.integer()
, as.double()
, or as.character()
. Failed coercion of strings generates a warning and a missing value:
as.integer(c("1", "1.5", "a"))
#> Warning: NAs introduced by coercion
#> [1] 1 1 NA
3.2.5 Exercises
How do you create raw and complex scalars? (See
?raw
and?complex
.)Test your knowledge of the vector coercion rules by predicting the output of the following uses of
c()
:c(1, FALSE) c("a", 1) c(TRUE, 1L)
Why is
1 == "1"
true? Why is-1 < FALSE
true? Why is"one" < 2
false?Why is the default missing value,
NA
, a logical vector? What’s special about logical vectors? (Hint: think aboutc(FALSE, NA_character_)
.)Precisely what do
is.atomic()
,is.numeric()
, andis.vector()
test for?
This is a slight simplification as R does not use “numeric” consistently, which we’ll come back to in Section 12.3.1.↩︎
Technically, the R language does not possess scalars. Everything that looks like a scalar is actually a vector of length one. This is mostly a theoretical distinction, but it does mean that expressions like
1[1]
work.↩︎L
is not intuitive, and you might wonder where it comes from. At the timeL
was added to R, R’s integer type was equivalent to a long integer in C, and C code could use a suffix ofl
orL
to force a number to be a long integer. It was decided thatl
was too visually similar toi
(used for complex numbers in R), leavingL
.↩︎You may have heard of the related
mode()
andstorage.mode()
functions. Do not use them: they exist only for compatibility with S.↩︎