Cumulative sum in a matrix

匿名 (未验证) 提交于 2019-12-03 08:35:02

问题:

I have a matrix like

A= [ 1 2 4      2 3 1      3 1 2 ] 

and I would like to calculate its cumulative sum by row and by column, that is, I want the result to be

B = [ 1  3  7        3  8  13        6  12 19 ] 

Any ideas of how to make this in R in a fast way? (Possibly using the function cumsum) (I have huge matrices)

Thanks!

回答1:

A one-liner:

t(apply(apply(A, 2, cumsum)), 1, cumsum)) 

The underlying observation is that you can first compute the cumulative sums over the columns and then the cumulative sum of this matrix over the rows.

Note: When doing the rows, you have to transpose the resulting matrix.

Your example:

> apply(A, 2, cumsum)      [,1] [,2] [,3] [1,]    1    2    4 [2,]    3    5    5 [3,]    6    6    7  > t(apply(apply(A, 2, cumsum), 1, cumsum))      [,1] [,2] [,3] [1,]    1    3    7 [2,]    3    8   13 [3,]    6   12   19 

About performance: I have now idea how good this approach scales to big matrices. Complexity-wise, this should be close to optimal. Usually, apply is not that bad in performance as well.


Edit

Now I was getting curious - what approach is the better one? A short benchmark:

> A <- matrix(runif(1000*1000, 1, 500), 1000) >  > system.time( +   B <- t(apply(apply(A, 2, cumsum), 1, cumsum)) + )        User      System     elapsed        0.082       0.011       0.093  >  > system.time( +   C <- lower.tri(diag(nrow(A)), diag = TRUE) %*% A %*% upper.tri(diag(ncol(A)), diag = TRUE) + )        User      System     elapsed        1.519       0.016       1.530  

Thus: Apply outperforms matrix multiplication by a factor of 15. (Just for comparision: MATLAB needed 0.10719 seconds.) The results do not really surprise, as the apply-version can be done in O(n^2), while the matrix multiplication will need approx. O(n^2.7) computations. Thus, all optimizations that matrix multiplication offers should be lost if n is big enough.



回答2:

Here is a more efficient implementation using the matrixStats package and a larger example matrix:

library(matrixStats) A <- matrix(runif(10000*10000, 1, 500), 10000)  # Thilo's answer system.time(B <- t(apply(apply(A, 2, cumsum), 1, cumsum))) user  system elapsed  3.684   0.504   4.201  # using matrixStats system.time(C <- colCumsums(rowCumsums(A))) user  system elapsed  0.164   0.068   0.233   all.equal(B, C) [1] TRUE 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!