Why is `vapply` safer than `sapply`?

风流意气都作罢 提交于 2019-11-26 18:27:37
Ari B. Friedman

As has already been noted, vapply does two things:

  • Slight speed improvement
  • Improves consistency by providing limited return type checks.

The second point is the greater advantage, as it helps catch errors before they happen and leads to more robust code. This return value checking could be done separately by using sapply followed by stopifnot to make sure that the return values are consistent with what you expected, but vapply is a little easier (if more limited, since custom error checking code could check for values within bounds, etc.).

Here's an example of vapply ensuring your result is as expected. This parallels something I was just working on while PDF scraping, where findD would use a to match a pattern in raw text data (e.g. I'd have a list that was split by entity, and a regex to match addresses within each entity. Occasionally the PDF had been converted out-of-order and there would be two addresses for an entity, which caused badness).

> input1 <- list( letters[1:5], letters[3:12], letters[c(5,2,4,7,1)] )
> input2 <- list( letters[1:5], letters[3:12], letters[c(2,5,4,7,15,4)] )
> findD <- function(x) x[x=="d"]
> sapply(input1, findD )
[1] "d" "d" "d"
> sapply(input2, findD )
[[1]]
[1] "d"

[[2]]
[1] "d"

[[3]]
[1] "d" "d"

> vapply(input1, findD, "" )
[1] "d" "d" "d"
> vapply(input2, findD, "" )
Error in vapply(input2, findD, "") : values must be length 1,
 but FUN(X[[3]]) result is length 2

As I tell my students, part of becoming a programmer is changing your mindset from "errors are annoying" to "errors are my friend."

Zero length inputs
One related point is that if the input length is zero, sapply will always return an empty list, regardless of the input type. Compare:

sapply(1:5, identity)
## [1] 1 2 3 4 5
sapply(integer(), identity)
## list()    
vapply(1:5, identity)
## [1] 1 2 3 4 5
vapply(integer(), identity)
## integer(0)

With vapply, you are guaranteed to have a particular type of output, so you don't need to write extra checks for zero length inputs.

Benchmarks

vapply can be a bit faster because it already knows what format it should be expecting the results in.

input1.long <- rep(input1,10000)

library(microbenchmark)
m <- microbenchmark(
  sapply(input1.long, findD ),
  vapply(input1.long, findD, "" )
)
library(ggplot2)
library(taRifx) # autoplot.microbenchmark is moving to the microbenchmark package in the next release so this should be unnecessary soon
autoplot(m)

The extra key strokes involved with vapply could save you time debugging confusing results later. If the function you're calling can return different datatypes, vapply should certainly be used.

One example that comes to mind would be sqlQuery in the RODBC package. If there's an error executing a query, this function returns a character vector with the message. So, for example, say you're trying to iterate over a vector of table names tnames and select the max value from the numeric column 'NumCol' in each table with:

sapply(tnames, 
   function(tname) sqlQuery(cnxn, paste("SELECT MAX(NumCol) FROM", tname))[[1]])

If all the table names are valid, this would result in a numeric vector. But if one of the table names happens to change in the database and the query fails, the results are going to be coerced into mode character. Using vapply with FUN.VALUE=numeric(1), however, will stop the error here and prevent it from popping up somewhere down the line---or worse, not at all.

user1317221_G

If you always want your result to be something in particular...e.g. a logical vector. vapply makes sure this happens but sapply does not necessarily do so.

a<-vapply(NULL, is.factor, FUN.VALUE=logical(1))
b<-sapply(NULL, is.factor)

is.logical(a)
is.logical(b)
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!