Nested for loops in R using foreach function and doParallel library

一世执手 提交于 2019-12-06 14:50:25
Khashaa

The correct parallelization of nested loop uses %:% (read here).

library(foreach)
library(doParallel)
registerDoParallel(detectCores())    
cosine_par1 <- function (x) {  
  co <- foreach(i=1:ncol(x)) %:%
    foreach (j=1:ncol(x)) %dopar% {    
      co = crossprod(x[, i], x[, j])/sqrt(crossprod(x[, i]) * crossprod(x[, j]))
    }
  matrix(unlist(co), ncol=ncol(x))
}

I recommend you to write it in Rcpp, rather than running it in parallel, because foreach(i=2:n, .combine=cbind) will not always bind the columns in the right order. Also, in the above code I removed only the lower triangular condition, but the running time is considerably slower than the unparallelized code time.

set.seed(186)
x=array(data=sample(1000*100), dim=c(1000, 100))
cseq <- cosine_seq(x)
cpar <- cosine_par1(x)
 all.equal(cpar, cseq)
#[1] TRUE
head(cpar[,1])
#[1] 1.0000000 0.7537411 0.7420011 0.7496145 0.7551984 0.7602620
head(cseq[,1])
#[1] 1.0000000 0.7537411 0.7420011 0.7496145 0.7551984 0.7602620

Addendum: For this specific problem, (semi-) vectorization of cosine_seq is possible; cosine_vec is about 40-50 times faster than cosine_seq.

cosine_vec <- function(x){
  crossprod(x) / sqrt(tcrossprod(apply(x, 2, crossprod)))
}
all.equal(cosine_vec(x), cosine_seq(x))
#[1] TRUE
library(microbenchmark)
microbenchmark(cosine_vec(x), cosine_seq(x), times=20L, unit="relative")
#Unit: relative
#          expr      min       lq     mean   median       uq      max neval
# cosine_vec(x)  1.00000  1.00000  1.00000  1.00000  1.00000  1.00000    20
# cosine_seq(x) 55.81694 52.80404 50.36549 52.17623 49.56412 42.94437    20
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!