11 Case study: rep()
11.1 What does rep()
do?
rep()
is an extremely useful base R function that repeats a vector x
in various ways. It has three details arguments: times
, each
, and length.out
3 that interact in complicated ways. Let’s explore the basics first:
c(1, 2, 4)
x <-
rep(x, times = 3)
#> [1] 1 2 4 1 2 4 1 2 4
rep(x, length.out = 10)
#> [1] 1 2 4 1 2 4 1 2 4 1
times
and length.out
replicate the vector in the same way, but length.out
allows you to specify a non-integer number of replications. If you specify both, length.out
wins.
rep(x, times = 3, length.out = 10)
#> [1] 1 2 4 1 2 4 1 2 4 1
The each
argument repeats individual components of the vector rather than the whole vector:
rep(x, each = 3)
#> [1] 1 1 1 2 2 2 4 4 4
And you can combine that with times
:
rep(x, each = 3, times = 2)
#> [1] 1 1 1 2 2 2 4 4 4 1 1 1 2 2 2 4 4 4
If you supply a vector to times
it works a similar way to each
, repeating each component the specified number of times:
rep(x, times = x)
#> [1] 1 2 2 4 4 4 4
11.2 What makes this function hard to understand?
There’s a complicated dependency between
times
,length.out
, andeach
.times
andlength.out
both control the same underlying variable in different ways, and you can not set them simultaneously.times
andeach
are mostly independent, but if you specify a vector fortimes
you can’t use each.rep(1:3, times = c(2, 2, 2), each = 2) #> Error in rep(1:3, times = c(2, 2, 2), each = 2): invalid 'times' argument
I think using
times
with a vector is confusing because it switches from replicating the whole vector to replicating individual values of the vector, likeeach
usually does.rep(1:3, each = 2) #> [1] 1 1 2 2 3 3 rep(1:3, times = 2) #> [1] 1 2 3 1 2 3 rep(1:3, times = c(2, 2, 2)) #> [1] 1 1 2 2 3 3
I think these two problems have the same underlying cause: rep()
is trying to do too much in a single function. I think we can make things simpler by turning rep()
into two functions: one that replicates the full vector, and one that replicates each element of the vector.
11.3 How might we improve the situation?
Two create two new functions, we need to first come up with names: I like rep_each()
and rep_full()
. rep_each()
was a fairly easy name to come up with. rep_full()
was a little harder and took a few iterations: I like that full
has the same number of letters as each
, which makes the two functions look like they belong together.
Next, we need to think about their arguments. Both will have a single data argument: x
, the vector to replicate. rep_each()
has a single details argument which specifies the number of times to replicate each element. rep_time()
has two mutually exclusive details arguments, the number of times to repeat the whole vector, or the desired length of the output.
What should we call the arguments? We’ve already captured the different replication strategies (each vs. full) in the function name, so I think the argument that specifies the number of times to replicate can be the same, and times
seems reasonable. For the second argument to rep_full()
, I draw inspiration from rep()
which uses length.out
. I think it’s obvious that the argument controls the output, so length
is adequate.
function(x, times) {
rep_each <- rep(times, length.out = length(x))
times <-rep(x, times = times)
}
function(x, times, length) {
rep_full <-if (!xor(missing(times), missing(length))) {
stop("Must supply exactly one of `times` and `length`", call. = FALSE)
}
if (!missing(times)) {
times * base::length(x)
length <-
}
rep(x, length.out = length)
}
The implementation of rep_full()
and rep_each()
in terms of rep.int()
and rep_len()
suggests that R-core members are aware of the problem.
(Note the downside of using length
as the argument name: we have to call base::length()
to avoid evaluating the missing length
when times is supplied.)
c(1, 2, 4)
x <-
rep_each(x, times = 2)
#> [1] 1 1 2 2 4 4
rep_full(x, times = 2)
#> [1] 1 2 4 1 2 4
rep_each(x, times = x)
#> [1] 1 2 2 4 4 4 4
rep_full(x, length = 5)
#> [1] 1 2 4 1 2
One downside of this approach is if you want to both replicate each component and the entire vector, you have to use two function calls, which is much more verbose than the rep()
equivalent. However, I don’t think this is a terribly common use case, and so I think a longer call is more readable.
That said, one argument for a single rep function is that rep_each()
and rep_full()
return the same result if you change their order (i.e. they’re commutative):
rep_full(rep_each(x, times = 2), times = 3)
#> [1] 1 1 2 2 4 4 1 1 2 2 4 4 1 1 2 2 4 4
rep_each(rep_full(x, times = 3), times = 2)
#> [1] 1 1 2 2 4 4 1 1 2 2 4 4 1 1 2 2 4 4
11.4 Dealing with bad inputs
The implementations above work well for correct inputs, but will also work without error for a number of incorrect inputs:
rep_full(1:3, 1:3)
#> Warning in rep(x, length.out = length): first element used of 'length.out'
#> argument
#> [1] 1 2 3
In the code below, I have used vec_assert()
and vec_recycle()
to make the desired types, sizes, and recycling rules explicit.
library(vctrs)
function(x, times) {
rep_each <-vec_assert(times, numeric())
vec_recycle(times, vec_size(x))
times <-
rep.int(x, times)
}
function(x, times, length) {
rep_full <-if (!xor(missing(times), missing(length))) {
stop("Must supply exactly one of `times` and `length`", call. = FALSE)
else if (!missing(times)) {
} vec_assert(times, numeric(), 1L)
times * base::length(x)
length <-else if (!missing(length)) {
} vec_assert(length, numeric(), 1L)
}
rep_len(x, length)
}
rep_full(1:3, "x")
#> Error: `times` must be a vector with type <double>.
#> Instead, it has type <character>.
rep_full(1:3, c(1, 2))
#> Error: `times` must have size 1, not size 2.
Note that the function specification is
rep(x, ...)
, andtimes
,each
, andlength.out
do not appear explicitly. You have to read the documentation to discover these arguments.↩︎