How to set seeds when using parallel package in R

南笙酒味 提交于 2020-12-31 20:14:17

问题


I am currently using the parallel package in R and I am trying to make by work reproducible by setting seeds.

However, if you set the seed before creating the cluster and performing the tasks you want in parallel, for some reason, it doesn't make it reproducible. I think I need to set the seed for each core when I make the cluster.

I have made a small example here to illustrate my problem:

library(parallel)

# function to generate 2 uniform random numbers
runif_parallel <- function() {
  # make cluster of two cores
  cl <- parallel::makeCluster(2)

  # sample uniform random numbers
  samples <- parallel::parLapplyLB(cl, X = 1:2, fun = function(i) runif(1))

  # close cluster
  parallel::stopCluster(cl)

  return(unlist(samples))
}

set.seed(41)
test1 <- runif_parallel()

set.seed(41)
test2 <- runif_parallel()

# they should be the same since they have the same seed
identical(test1, test2)

In this example, the test1 and test2 should be the same, as they have the same seed, but they return different results.

Can I get some help with where I'm going wrong please?

Note that I've written this example the way I have to mimic how I'm using it right now - there are probably cleaner ways to generate two random uniform numbers in parallel.


回答1:


You need to run set.seed within each job. Here is a reproducable random generation:

cl <- parallel::makeCluster(2)

# sample uniform random numbers
parallel::clusterEvalQ(cl, set.seed(41));

samples <- parallel::parLapplyLB(cl, X = 1:2, fun = function(i){set.seed(i);runif(1)})
samples

# [[1]]
# [1] 0.2655087
# 
# [[2]]
# [1] 0.1848823

samples <- parallel::parLapplyLB(cl, X = 1:2, fun = function(i){set.seed(i);runif(1)})
samples

# [[1]]
# [1] 0.2655087
# 
# [[2]]
# [1] 0.1848823

parallel::stopCluster(cl)


来源:https://stackoverflow.com/questions/58631433/how-to-set-seeds-when-using-parallel-package-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!