3.1 Introduction

This chapter discusses the most important family of data types in base R: vectors10. While you’ve probably already used many (if not all) of the different types of vectors, you may not have thought deeply about how they’re interrelated. In this chapter, I won’t cover individual vectors types in too much detail, but I will show you how all the types fit together as a whole. If you need more details, you can find them in R’s documentation.

Vectors come in two flavours: atomic vectors and lists11. They differ in terms of their elements’ types: for atomic vectors, all elements must have the same type; for lists, elements can have different types. While not a vector, NULL is closely related to vectors and often serves the role of a generic zero length vector. This diagram, which we’ll be expanding on throughout this chapter, illustrates the basic relationships:

Every vector can also have attributes, which you can think of as a named list of arbitrary metadata. Two attributes are particularly important. The dimension attribute turns vectors into matrices and arrays and the class attribute powers the S3 object system. While you’ll learn how to use S3 in Chapter 13, here you’ll learn about some of the most important S3 vectors: factors, date and times, data frames, and tibbles. And while 2D structures like matrices and data frames are not necessarily what come to mind when you think of vectors, you’ll also learn why R considers them to be vectors.

Quiz

Take this short quiz to determine if you need to read this chapter. If the answers quickly come to mind, you can comfortably skip this chapter. You can check your answers in Section 3.8.

1. What are the four common types of atomic vectors? What are the two rare types?

2. What are attributes? How do you get them and set them?

3. How is a list different from an atomic vector? How is a matrix different from a data frame?

4. Can you have a list that is a matrix? Can a data frame have a column that is a matrix?

5. How do tibbles behave differently from data frames?

Outline

• Section 3.2 introduces you to the atomic vectors: logical, integer, double, and character. These are R’s simplest data structures.

• Section 3.3 takes a small detour to discuss attributes, R’s flexible metadata specification. The most important attributes are names, dimensions, and class.

• Section 3.4 discusses the important vector types that are built by combining atomic vectors with special attributes. These include factors, dates, date-times, and durations.

• Section 3.5 dives into lists. Lists are very similar to atomic vectors, but have one key difference: an element of a list can be any data type, including another list. This makes them suitable for representing hierarchical data.

• Section 3.6 teaches you about data frames and tibbles, which are used to represent rectangular data. They combine the behaviour of lists and matrices to make a structure ideally suited for the needs of statistical data.

1. Collectively, all the other data types are known as “node” types, which include things like functions and environments. You’re most likely to come across this highly technical term when using gc(): the “N” in Ncells stands for nodes and the “V” in Vcells stands for vectors.↩︎

2. A few places in R’s documentation call lists generic vectors to emphasise their difference from atomic vectors.↩︎