21.3 LaTeX

The next DSL will convert R expressions into their LaTeX math equivalents. (This is a bit like ?plotmath, but for text instead of plots.) LaTeX is the lingua franca of mathematicians and statisticians: it’s common to use LaTeX notation whenever you want to express an equation in text, like in email. Since many reports are produced using both R and LaTeX, it might be useful to be able to automatically convert mathematical expressions from one language to the other.

Because we need to convert both functions and names, this mathematical DSL will be more complicated than the HTML DSL. We’ll also need to create a default conversion, so that symbols that we don’t know about get a standard conversion. This means that we can no longer use just evaluation: we also need to walk the abstract syntax tree (AST).

21.3.1 LaTeX mathematics

Before we begin, let’s quickly cover how formulas are expressed in LaTeX. The full standard is very complex, but fortunately is well documented, and the most common commands have a fairly simple structure:

  • Most simple mathematical equations are written in the same way you’d type them in R: x * y, z ^ 5. Subscripts are written using _ (e.g., x_1).

  • Special characters start with a \: \pi = \(\pi\), \pm = \(\pm\), and so on. There are a huge number of symbols available in LaTeX: searching online for latex math symbols returns many lists. There’s even a service that will look up the symbol you sketch in the browser.

  • More complicated functions look like \name{arg1}{arg2}. For example, to write a fraction you’d use \frac{a}{b}. To write a square root, you’d use \sqrt{a}.

  • To group elements together use {}: i.e., x ^ a + b versus x ^ {a + b}.

  • In good math typesetting, a distinction is made between variables and functions. But without extra information, LaTeX doesn’t know whether f(a * b) represents calling the function f with input a * b, or is shorthand for f * (a * b). If f is a function, you can tell LaTeX to typeset it using an upright font with \textrm{f}(a * b). (The rm stands for “Roman”, the opposite of italics.)

21.3.2 Goal

Our goal is to use these rules to automatically convert an R expression to its appropriate LaTeX representation. We’ll tackle this in four stages:

  • Convert known symbols: pi\pi

  • Leave other symbols unchanged: xx, yy

  • Convert known functions to their special forms: sqrt(frac(a, b))\sqrt{\frac{a}{b}}

  • Wrap unknown functions with \textrm: f(a)\textrm{f}(a)

We’ll code this translation in the opposite direction of what we did with the HTML DSL. We’ll start with infrastructure, because that makes it easy to experiment with our DSL, and then work our way back down to generate the desired output.

21.3.3 to_math()

To begin, we need a wrapper function that will convert R expressions into LaTeX math expressions. This will work like to_html() by capturing the unevaluated expression and evaluating it in a special environment. There are two main differences:

  • The evaluation environment is no longer constant, as it has to vary depending on the input. This is necessary to handle unknown symbols and functions.

  • We never evaluate in the argument environment because we’re translating every function to a LaTeX expression. The user will need to use explicitly !! in order to evaluate normally.

This gives us:

to_math <- function(x) {
  expr <- enexpr(x)
  out <- eval_bare(expr, latex_env(expr))

  latex(out)
}

latex <- function(x) structure(x, class = "advr_latex")
print.advr_latex <- function(x) {
  cat("<LATEX> ", x, "\n", sep = "")
}

Next we’ll build up latex_env(), starting simply and getting progressively more complex.

21.3.4 Known symbols

Our first step is to create an environment that will convert the special LaTeX symbols used for Greek characters, e.g., pi to \pi. We’ll use the trick from Section 20.4.3 to bind the symbol pi to the value "\pi".

greek <- c(
  "alpha", "theta", "tau", "beta", "vartheta", "pi", "upsilon",
  "gamma", "varpi", "phi", "delta", "kappa", "rho",
  "varphi", "epsilon", "lambda", "varrho", "chi", "varepsilon",
  "mu", "sigma", "psi", "zeta", "nu", "varsigma", "omega", "eta",
  "xi", "Gamma", "Lambda", "Sigma", "Psi", "Delta", "Xi",
  "Upsilon", "Omega", "Theta", "Pi", "Phi"
)
greek_list <- set_names(paste0("\\", greek), greek)
greek_env <- as_environment(greek_list)

We can then check it:

latex_env <- function(expr) {
  greek_env
}

to_math(pi)
#> <LATEX> \pi
to_math(beta)
#> <LATEX> \beta

Looks good so far!

21.3.5 Unknown symbols

If a symbol isn’t Greek, we want to leave it as is. This is tricky because we don’t know in advance what symbols will be used, and we can’t possibly generate them all. Instead, we’ll use the approach described in Section 18.5: walking the AST to find all symbols. This gives us all_names_rec() and helper all_names():

all_names_rec <- function(x) {
  switch_expr(x,
    constant = character(),
    symbol =   as.character(x),
    call =     flat_map_chr(as.list(x[-1]), all_names)
  )
}

all_names <- function(x) {
  unique(all_names_rec(x))
}

all_names(expr(x + y + f(a, b, c, 10)))
#> [1] "x" "y" "a" "b" "c"

We now want to take that list of symbols and convert it to an environment so that each symbol is mapped to its corresponding string representation (e.g., so eval(quote(x), env) yields "x"). We again use the pattern of converting a named character vector to a list, then converting the list to an environment.

latex_env <- function(expr) {
  names <- all_names(expr)
  symbol_env <- as_environment(set_names(names))

  symbol_env
}

to_math(x)
#> <LATEX> x
to_math(longvariablename)
#> <LATEX> longvariablename
to_math(pi)
#> <LATEX> pi

This works, but we need to combine it with the Greek symbols environment. Since we want to give preference to Greek over defaults (e.g., to_math(pi) should give "\\pi", not "pi"), symbol_env needs to be the parent of greek_env. To do that, we need to make a copy of greek_env with a new parent. This gives us a function that can convert both known (Greek) and unknown symbols.

latex_env <- function(expr) {
  # Unknown symbols
  names <- all_names(expr)
  symbol_env <- as_environment(set_names(names))

  # Known symbols
  env_clone(greek_env, parent = symbol_env)
}

to_math(x)
#> <LATEX> x
to_math(longvariablename)
#> <LATEX> longvariablename
to_math(pi)
#> <LATEX> \pi

21.3.6 Known functions

Next we’ll add functions to our DSL. We’ll start with a couple of helpers that make it easy to add new unary and binary operators. These functions are very simple: they only assemble strings.

unary_op <- function(left, right) {
  new_function(
    exprs(e1 = ),
    expr(
      paste0(!!left, e1, !!right)
    ),
    caller_env()
  )
}

binary_op <- function(sep) {
  new_function(
    exprs(e1 = , e2 = ),
    expr(
      paste0(e1, !!sep, e2)
    ),
    caller_env()
  )
}

unary_op("\\sqrt{", "}")
#> function (e1) 
#> paste0("\\sqrt{", e1, "}")
binary_op("+")
#> function (e1, e2) 
#> paste0(e1, "+", e2)

Using these helpers, we can map a few illustrative examples of converting R to LaTeX. Note that with R’s lexical scoping rules helping us, we can easily provide new meanings for standard functions like +, -, and *, and even ( and {.

# Binary operators
f_env <- child_env(
  .parent = empty_env(),
  `+` = binary_op(" + "),
  `-` = binary_op(" - "),
  `*` = binary_op(" * "),
  `/` = binary_op(" / "),
  `^` = binary_op("^"),
  `[` = binary_op("_"),

  # Grouping
  `{` = unary_op("\\left{ ", " \\right}"),
  `(` = unary_op("\\left( ", " \\right)"),
  paste = paste,

  # Other math functions
  sqrt = unary_op("\\sqrt{", "}"),
  sin =  unary_op("\\sin(", ")"),
  log =  unary_op("\\log(", ")"),
  abs =  unary_op("\\left| ", "\\right| "),
  frac = function(a, b) {
    paste0("\\frac{", a, "}{", b, "}")
  },

  # Labelling
  hat =   unary_op("\\hat{", "}"),
  tilde = unary_op("\\tilde{", "}")
)

We again modify latex_env() to include this environment. It should be the last environment R looks for names in so that expressions like sin(sin) will work.

latex_env <- function(expr) {
  # Known functions
  f_env

  # Default symbols
  names <- all_names(expr)
  symbol_env <- as_environment(set_names(names), parent = f_env)

  # Known symbols
  greek_env <- env_clone(greek_env, parent = symbol_env)

  greek_env
}

to_math(sin(x + pi))
#> <LATEX> \sin(x + \pi)
to_math(log(x[i]^2))
#> <LATEX> \log(x_i^2)
to_math(sin(sin))
#> <LATEX> \sin(sin)

21.3.7 Unknown functions

Finally, we’ll add a default for functions that we don’t yet know about. We can’t know in advance what the unknown funtions will be so we again walk the AST to find them:

all_calls_rec <- function(x) {
  switch_expr(x,
    constant = ,
    symbol =   character(),
    call = {
      fname <- as.character(x[[1]])
      children <- flat_map_chr(as.list(x[-1]), all_calls)
      c(fname, children)
    }
  )
}
all_calls <- function(x) {
  unique(all_calls_rec(x))
}

all_calls(expr(f(g + b, c, d(a))))
#> [1] "f" "+" "d"

We need a closure that will generate the functions for each unknown call:

unknown_op <- function(op) {
  new_function(
    exprs(... = ),
    expr({
      contents <- paste(..., collapse = ", ")
      paste0(!!paste0("\\mathrm{", op, "}("), contents, ")")
    })
  )
}
unknown_op("foo")
#> function (...) 
#> {
#>     contents <- paste(..., collapse = ", ")
#>     paste0("\\mathrm{foo}(", contents, ")")
#> }
#> <environment: 0x55cc90472518>

And again we update latex_env():

latex_env <- function(expr) {
  calls <- all_calls(expr)
  call_list <- map(set_names(calls), unknown_op)
  call_env <- as_environment(call_list)

  # Known functions
  f_env <- env_clone(f_env, call_env)

  # Default symbols
  names <- all_names(expr)
  symbol_env <- as_environment(set_names(names), parent = f_env)

  # Known symbols
  greek_env <- env_clone(greek_env, parent = symbol_env)
  greek_env
}

This completes our original requirements:

to_math(sin(pi) + f(a))
#> <LATEX> \sin(\pi) + \mathrm{f}(a)

You could certainly take this idea further and translate types of mathematical expression, but you should not need any additional metaprogramming tools.

21.3.8 Exercises

  1. Add escaping. The special symbols that should be escaped by adding a backslash in front of them are \, $, and %. Just as with HTML, you’ll need to make sure you don’t end up double-escaping. So you’ll need to create a small S3 class and then use that in function operators. That will also allow you to embed arbitrary LaTeX if needed.

  2. Complete the DSL to support all the functions that plotmath supports.