问题
I have a large matrix with 2000 columns and 3000 rows. For each column, I want to do a sliding window where I sum 15 rows together, then go down one row and sum the next 15, etc... and create a new matrix with this information. I have a function that works (although seems a bit slow) but would like to run it in parallel, as this is part of a larger script and if I use apply functions without the parallel equivalent the open cluster shuts down. Moreover, I have to do this whole operation 100 times. Later in the script, I use the parLapply or parSapply.
Here is the code I have in a non-parallel fashion:
# df is a matrix with 2000 columns and 3000 rows, all numeric and no NAs
size <- 15 # size of window
len <- nrow(df) - size + 1 # number of sliding windows to perform
sumsmatrix <- apply(df, 2, function(x){
result <- sapply(1:len, function(y){
sum(x[y:(y+size-1)])
})
return(result)
})
Thanks in advance. Ron
回答1:
Try using cumsum
, you won't have to sum the same numbers over again.
sumsmatrix <- apply(df, 2, function(x)
cumsum(x)[size:nrow(df)] - c(0,cumsum(x)[1:(len-1)]))
It should be about 100 times faster than what you were doing.
Here's how it works:
Let's just say that your x
is only 5 long, and your window size is 3, to make it easier.
x <- 1:5
x
# [1] 1 2 3 4 5
cumsum(x)
# [1] 1 3 6 10 15
So, the third number of cumsum(x)
is what you want for the first sum, but the fourth and fifth numbers are too big, because they inlcude the first few numbers as part of the window. So, you just subtract the two.
cumsum(x)[3:5]
# [1] 6 10 15
cumsum(x)[1:2]
# [1] 1 3
But, for the first one you need to subtract zero.
cumsum(x)[3:5]
# [1] 6 10 15
c(0,cumsum(x)[1:2])
# [1] 0 1 3
回答2:
As @Andrie mentioned, the zoo
package has some useful moving window functions, such as rollsum
and rollapply
. I'm not sure what your OS is, i.e. what parallel packages you are using, but here's a quick example:
library(doSNOW)
library(foreach)
library(zoo)
##
oldD <- matrix(
sample(1:5, (2000*3000), replace=TRUE),
ncol=2000)
##
cl <- makeCluster(3,"SOCK")
registerDoSNOW(cl)
##
newD <- foreach(j=1:ncol(oldD),
.combine=cbind,
.export="rollsum") %dopar% {
rollsum(oldD[,j],15)
}
##
stopCluster(cl)
##
来源:https://stackoverflow.com/questions/24388500/sliding-window-on-each-column-of-a-matrix-in-r-with-parallel-processing