I am working with multidimensional array both on R and MATLAB, these arrays have five dimensions (total of 14.5M of elements). I have to remove a dimension applying an arith
In R, apply
is not the right tool for the task. If you had a matrix and needed the row or column means, you would use the much much faster, vectorized rowMeans
and colMeans
. You can still use these for a multi-dimensional array but you need to be a little creative:
Assuming your array has n
dimensions, and you want to compute means along dimension i
:
aperm
to move the dimension i
to the last position n
rowMeans
with dims = n - 1
Similarly, you could:
aperm
to move the dimension i
to the first positioncolMeans
with dims = 1
a <- array(data = runif(144*73*6*23*10), dim = c(144,73,10,6,23))
means.along <- function(a, i) {
n <- length(dim(a))
b <- aperm(a, c(seq_len(n)[-i], i))
rowMeans(b, dims = n - 1)
}
system.time(z1 <- apply(a, c(1,2,4,5), mean))
# user system elapsed
# 25.132 0.109 25.239
system.time(z2 <- means.along(a, 3))
# user system elapsed
# 0.283 0.007 0.289
identical(z1, z2)
# [1] TRUE
mean
is particularly slow because of S3 method dispatch. This is faster:
set.seed(42)
a = array(data = runif(144*73*6*23*10), dim = c(144,73,10,6,23))
system.time({b = apply(a, c(1,2,4,5), mean.default)})
# user system elapsed
#16.80 0.03 16.94
If you don't need to handle NA
s you can use the internal function:
system.time({b1 = apply(a, c(1,2,4,5), function(x) .Internal(mean(x)))})
# user system elapsed
# 6.80 0.04 6.86
For comparison:
system.time({b2 = apply(a, c(1,2,4,5), function(x) sum(x)/length(x))})
# user system elapsed
# 9.05 0.01 9.08
system.time({b3 = apply(a, c(1,2,4,5), sum)
b3 = b3/dim(a)[[3]]})
# user system elapsed
# 7.44 0.03 7.47
(Note that all timings are only approximate. Proper benchmarking would require running this repreatedly, e.g., using one of the bechmarking packages. But I'm not patient enough for that right now.)
It might be possible to speed this up with an Rcpp implementation.