I am working with multidimensional array both on R and MATLAB, these arrays have five dimensions (total of 14.5M of elements). I have to remove a dimension applying an arith
In R, apply is not the right tool for the task. If you had a matrix and needed the row or column means, you would use the much much faster, vectorized rowMeans and colMeans. You can still use these for a multi-dimensional array but you need to be a little creative:
Assuming your array has n dimensions, and you want to compute means along dimension i:
aperm to move the dimension i to the last position nrowMeans with dims = n - 1Similarly, you could:
aperm to move the dimension i to the first positioncolMeans with dims = 1a <- array(data = runif(144*73*6*23*10), dim = c(144,73,10,6,23))
means.along <- function(a, i) {
n <- length(dim(a))
b <- aperm(a, c(seq_len(n)[-i], i))
rowMeans(b, dims = n - 1)
}
system.time(z1 <- apply(a, c(1,2,4,5), mean))
# user system elapsed
# 25.132 0.109 25.239
system.time(z2 <- means.along(a, 3))
# user system elapsed
# 0.283 0.007 0.289
identical(z1, z2)
# [1] TRUE
mean is particularly slow because of S3 method dispatch. This is faster:
set.seed(42)
a = array(data = runif(144*73*6*23*10), dim = c(144,73,10,6,23))
system.time({b = apply(a, c(1,2,4,5), mean.default)})
# user system elapsed
#16.80 0.03 16.94
If you don't need to handle NAs you can use the internal function:
system.time({b1 = apply(a, c(1,2,4,5), function(x) .Internal(mean(x)))})
# user system elapsed
# 6.80 0.04 6.86
For comparison:
system.time({b2 = apply(a, c(1,2,4,5), function(x) sum(x)/length(x))})
# user system elapsed
# 9.05 0.01 9.08
system.time({b3 = apply(a, c(1,2,4,5), sum)
b3 = b3/dim(a)[[3]]})
# user system elapsed
# 7.44 0.03 7.47
(Note that all timings are only approximate. Proper benchmarking would require running this repreatedly, e.g., using one of the bechmarking packages. But I'm not patient enough for that right now.)
It might be possible to speed this up with an Rcpp implementation.