...regarding execution time and / or memory.
If this is not true, prove it with a code snippet. Note that speedup by vectorization does not count. The speedup must c
When applying functions over subsets of a vector, tapply
can be pretty faster than a for loop. Example:
df <- data.frame(id = rep(letters[1:10], 100000),
value = rnorm(1000000))
f1 <- function(x)
tapply(x$value, x$id, sum)
f2 <- function(x){
res <- 0
for(i in seq_along(l <- unique(x$id)))
res[i] <- sum(x$value[x$id == l[i]])
names(res) <- l
res
}
library(microbenchmark)
> microbenchmark(f1(df), f2(df), times=100)
Unit: milliseconds
expr min lq median uq max neval
f1(df) 28.02612 28.28589 28.46822 29.20458 32.54656 100
f2(df) 38.02241 41.42277 41.80008 42.05954 45.94273 100
apply
, however, in most situation doesn't provide any speed increase, and in some cases can be even lot slower:
mat <- matrix(rnorm(1000000), nrow=1000)
f3 <- function(x)
apply(x, 2, sum)
f4 <- function(x){
res <- 0
for(i in 1:ncol(x))
res[i] <- sum(x[,i])
res
}
> microbenchmark(f3(mat), f4(mat), times=100)
Unit: milliseconds
expr min lq median uq max neval
f3(mat) 14.87594 15.44183 15.87897 17.93040 19.14975 100
f4(mat) 12.01614 12.19718 12.40003 15.00919 40.59100 100
But for these situations we've got colSums
and rowSums
:
f5 <- function(x)
colSums(x)
> microbenchmark(f5(mat), times=100)
Unit: milliseconds
expr min lq median uq max neval
f5(mat) 1.362388 1.405203 1.413702 1.434388 1.992909 100