# 24 Type-stability

The less you need to know about a function’s inputs to predict the type of its output, the better. Ideally, a function should either always return the same type of thing, or return something that can be trivially computed from its inputs.

If a function is **type-stable** it satisfies two conditions:

You can predict the output type based only on the input types (not their values).

If the function uses

`...`

, the order of arguments in does not affect the output type.

`library(vctrs)`

## 24.1 Simple examples

`purrr::map()`

and`base::lapply()`

are trivially type-stable because they always return lists.`paste()`

is type stable because it always returns a character vector.`vec_ptype(paste(1)) #> character(0) vec_ptype(paste("x")) #> character(0)`

`base::mean(x)`

almost always returns the same type of output as`x`

. For example, the mean of a numeric vector is a numeric vector, and the mean of a date-time is a date-time.`vec_ptype(mean(1)) #> numeric(0) vec_ptype(mean(Sys.time())) #> POSIXct of length 0`

`ifelse()`

is not type-stable because the output type depends on the value:`vec_ptype(ifelse(NA, 1L, 2)) #> <unspecified> [0] vec_ptype(ifelse(FALSE, 1L, 2)) #> numeric(0) vec_ptype(ifelse(TRUE, 1L, 2)) #> integer(0)`

## 24.2 More complicated examples

Some functions are more complex because they take multiple input types and have to return a single output type. This includes functions like `c()`

and `ifelse()`

. The rules governing base R functions are idiosyncratic, and each function tends to apply it’s own slightly different set of rules. Tidy functions should use the consistent set of rules provided by the vctrs package.

## 24.3 Challenge: the median

A more challenging example is `median()`

. The median of a vector is a value that (as evenly as possible) splits the vector into a lower half and an upper half. In the absence of ties, `mean(x > median(x)) == mean(x <= median(x)) == 0.5`

. The median is straightforward to compute for odd lengths: you simply order the vector and pick the value in the middle, i.e. `sort(x)[(length(x) - 1) / 2]`

. It’s clear that the type of the output should be the same type as `x`

, and this algorithm can be applied to any vector that can be ordered.

But what if the vector has an even length? In this case, there’s no longer a unique median, and by convention we usually take the mean of the middle two numbers.

In R, this makes the `median()`

not type-stable:

```
typeof(median(1:3))
#> [1] "integer"
typeof(median(1:4))
#> [1] "double"
```

Base R doesn’t appear to follow a consistent principle when computing the median of a vector of length 2. Factors throw an error, but dates do not (even though there’s no date half way between two days that differ by an odd number of days).

```
median(factor(1:2))
#> Error in median.default(factor(1:2)): need numeric data
median(Sys.Date() + 0:1)
#> [1] "2020-09-23"
```

To be clear, the problems caused by this behaviour are quite small in practice, but it makes the analysis of `median()`

more complex, and it makes it difficult to decide what principle you should adhere to when creating `median`

methods for new vector classes.

```
median("foo")
#> [1] "foo"
median(c("foo", "bar"))
#> Warning in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]): argument
#> is not numeric or logical: returning NA
#> [1] NA
```

## 24.4 Exercises

How is a date like an integer? Why is this inconsistent?

`vec_ptype(mean(Sys.Date())) #> Date of length 0 vec_ptype(mean(1L)) #> numeric(0)`