Sliding window on each column of a matrix in R with parallel processing

早过忘川 提交于 2019-12-23 05:41:30

问题


I have a large matrix with 2000 columns and 3000 rows. For each column, I want to do a sliding window where I sum 15 rows together, then go down one row and sum the next 15, etc... and create a new matrix with this information. I have a function that works (although seems a bit slow) but would like to run it in parallel, as this is part of a larger script and if I use apply functions without the parallel equivalent the open cluster shuts down. Moreover, I have to do this whole operation 100 times. Later in the script, I use the parLapply or parSapply.

Here is the code I have in a non-parallel fashion:

# df is a matrix with 2000 columns and 3000 rows, all numeric and no NAs

size <- 15 # size of window
len <- nrow(df) - size + 1 # number of sliding windows to perform 

sumsmatrix <- apply(df, 2, function(x){
      result <- sapply(1:len, function(y){
      sum(x[y:(y+size-1)])
       })
      return(result)
      })

Thanks in advance. Ron


回答1:


Try using cumsum, you won't have to sum the same numbers over again.

sumsmatrix <- apply(df, 2, function(x)                   
                     cumsum(x)[size:nrow(df)] - c(0,cumsum(x)[1:(len-1)]))

It should be about 100 times faster than what you were doing.


Here's how it works:

Let's just say that your x is only 5 long, and your window size is 3, to make it easier.

x <- 1:5
x
# [1] 1 2 3 4 5
cumsum(x)
# [1]  1  3  6 10 15

So, the third number of cumsum(x) is what you want for the first sum, but the fourth and fifth numbers are too big, because they inlcude the first few numbers as part of the window. So, you just subtract the two.

cumsum(x)[3:5]    
# [1] 6 10 15    
cumsum(x)[1:2]
# [1]    1  3

But, for the first one you need to subtract zero.

cumsum(x)[3:5]    
# [1] 6 10 15    
c(0,cumsum(x)[1:2])
# [1] 0  1  3



回答2:


As @Andrie mentioned, the zoo package has some useful moving window functions, such as rollsum and rollapply. I'm not sure what your OS is, i.e. what parallel packages you are using, but here's a quick example:

library(doSNOW)
library(foreach)
library(zoo)
##
oldD <- matrix(
  sample(1:5, (2000*3000), replace=TRUE),
  ncol=2000)
##
cl <- makeCluster(3,"SOCK")
registerDoSNOW(cl)
##
newD <- foreach(j=1:ncol(oldD),
                .combine=cbind,
                .export="rollsum") %dopar% {

                  rollsum(oldD[,j],15)

                }
##
stopCluster(cl)
##


来源:https://stackoverflow.com/questions/24388500/sliding-window-on-each-column-of-a-matrix-in-r-with-parallel-processing

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!