9.6 Predicate functionals

A predicate is a function that returns a single TRUE or FALSE, like is.character(), is.null(), or all(), and we say a predicate matches a vector if it returns TRUE.

9.6.1 Basics

A predicate functional applies a predicate to each element of a vector. purrr provides six useful functions which come in three pairs:

• some(.x, .p) returns TRUE if any element matches; every(.x, .p) returns TRUE if all elements match.

These are similar to any(map_lgl(.x, .p)) and all(map_lgl(.x, .p)) but they terminate early: some() returns TRUE when it sees the first TRUE, and every() returns FALSE when it sees the first FALSE.

• detect(.x, .p) returns the value of the first match; detect_index(.x, .p) returns the location of the first match.

• keep(.x, .p) keeps all matching elements; discard(.x, .p) drops all matching elements.

The following example shows how you might use these functionals with a data frame:

df <- data.frame(x = 1:3, y = c("a", "b", "c"))
detect(df, is.factor)
#> NULL
detect_index(df, is.factor)
#> [1] 0

str(keep(df, is.factor))
#> 'data.frame':    3 obs. of  0 variables
#> 'data.frame':    3 obs. of  2 variables:
#>  $x: int 1 2 3 #>$ y: chr  "a" "b" "c"

9.6.2 Map variants

map() and modify() come in variants that also take predicate functions, transforming only the elements of .x where .p is TRUE.

df <- data.frame(
num1 = c(0, 10, 20),
num2 = c(5, 6, 7),
chr1 = c("a", "b", "c"),
stringsAsFactors = FALSE
)

str(map_if(df, is.numeric, mean))
#> List of 3
#>  $num1: num 10 #>$ num2: num 6
#>  $chr1: chr [1:3] "a" "b" "c" str(modify_if(df, is.numeric, mean)) #> 'data.frame': 3 obs. of 3 variables: #>$ num1: num  10 10 10
#>  $num2: num 6 6 6 #>$ chr1: chr  "a" "b" "c"
str(map(keep(df, is.numeric), mean))
#> List of 2
#>  $num1: num 10 #>$ num2: num 6

9.6.3 Exercises

1. Why isnâ€™t is.na() a predicate function? What base R function is closest to being a predicate version of is.na()?

2. simple_reduce() has a problem when x is length 0 or length 1. Describe the source of the problem and how you might go about fixing it.

simple_reduce <- function(x, f) {
out <- x[[1]]
for (i in seq(2, length(x))) {
out <- f(out, x[[i]])
}
out
}
3. Implement the span() function from Haskell: given a list x and a predicate function f, span(x, f) returns the location of the longest sequential run of elements where the predicate is true. (Hint: you might find rle() helpful.)

4. Implement arg_max(). It should take a function and a vector of inputs, and return the elements of the input where the function returns the highest value. For example, arg_max(-10:5, function(x) x ^ 2) should return -10. arg_max(-5:5, function(x) x ^ 2) should return c(-5, 5). Also implement the matching arg_min() function.

5. The function below scales a vector so it falls in the range [0, 1]. How would you apply it to every column of a data frame? How would you apply it to every numeric column in a data frame?

scale01 <- function(x) {
rng <- range(x, na.rm = TRUE)
(x - rng[1]) / (rng[2] - rng[1])
}