writing to global variables in using doSNOW and doing parallelization in R?

烈酒焚心 提交于 2020-01-22 20:05:25

问题


Is there a problem when accessing/writing to global variable in using doSNOW package on multiple cores?

In the below program, each of the MyCalculations(ii) writes to the ii-th column of the matrix "globalVariable"...

Do you think the result will be correct? Will there be hidden catches?

Thanks a lot!

p.s. I have to write out to the global variable because this is a simplied example, in fact I have lots of outputs that need to be transported from within the parallel loops... therefore, probably the only way is to write out to global variables...

library(doSNOW)
MaxSearchSpace=44*5
globalVariable=matrix(0, 10000, MaxSearchSpace)
cl<-makeCluster(7)
registerDoSNOW(cl)
foreach (ii = 2:nMaxSearchSpace, .combine=cbind, .verbose=F) %dopar%
  {
   MyCalculations(ii)
  }

stopCluster(cl)

p.s. I am asking - within the DoSnow framework, is there any danger of accessing/writing global variables... thx


回答1:


Since this question is a couple months old, I hope you've found an answer by now. However, in case you're still interested in feedback, here's something to consider:

When using foreach with a parallel backend, you won't be able to assign to variables in R's global environment in the way you're attempting (you probably noticed this). Using a sequential backend, assignment will work, but not using a parallel one like with doSNOW.

Instead, save all the results of your calculations for each iteration in a list and return this to an object, so that you can extract the appropriate results after all calculations have been completed.

My suggestion starts similarly to your example:

library(doSNOW)
MaxSearchSpace <- 44*5
cl <- makeCluster(parallel::detectCores())

# do not create the globalVariable object

registerDoSNOW(cl)

# Save the results of the `foreach` iterations as 
# lists of lists in an object (`theRes`)

theRes <- foreach (ii = 2:MaxSearchSpace, .verbose=F) %dopar%
  {
# do some calculations
   theNorms <- rnorm(10000)
   thePois <- rpois(10000, 2)
# store the results in a list
   list(theNorms, thePois)
  }

After all iterations have been completed, extract the results from theRes and store them as objects (e.g., globalVariable, globalVariable2, etc.)

globalVariable1 <- do.call(cbind, lapply(theRes, "[[", 1))
globalVariable2 <- do.call(cbind, lapply(theRes, "[[", 2))

With this in mind, if you are performing calculations with each iteration that are dependent on the results of calculations from previous iterations, then this type of parallel computing is not the approach to take.



来源:https://stackoverflow.com/questions/9404881/writing-to-global-variables-in-using-dosnow-and-doing-parallelization-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!