I am working with multidimensional array both on R and MATLAB, these arrays have five dimensions (total of 14.5M of elements). I have to remove a dimension applying an arith
mean
is particularly slow because of S3 method dispatch. This is faster:
set.seed(42)
a = array(data = runif(144*73*6*23*10), dim = c(144,73,10,6,23))
system.time({b = apply(a, c(1,2,4,5), mean.default)})
# user system elapsed
#16.80 0.03 16.94
If you don't need to handle NA
s you can use the internal function:
system.time({b1 = apply(a, c(1,2,4,5), function(x) .Internal(mean(x)))})
# user system elapsed
# 6.80 0.04 6.86
For comparison:
system.time({b2 = apply(a, c(1,2,4,5), function(x) sum(x)/length(x))})
# user system elapsed
# 9.05 0.01 9.08
system.time({b3 = apply(a, c(1,2,4,5), sum)
b3 = b3/dim(a)[[3]]})
# user system elapsed
# 7.44 0.03 7.47
(Note that all timings are only approximate. Proper benchmarking would require running this repreatedly, e.g., using one of the bechmarking packages. But I'm not patient enough for that right now.)
It might be possible to speed this up with an Rcpp implementation.