问题
I have written a routine that takes considerable time without parallelization.
The issue is that I am unsure what to iterate over, since I have a repeat loop with breaks.
The loop consists of the following code snippet (for loop not shown):
repeat{
if(R < p){
HAC.sim(K = K, N = ceiling(Nstar), Hstar = Hstar, probs = probs, perms = perms, equal.freq = FALSE, subset.haps = NULL)
} else{
break
}
}
I would like to use foreach() with the parallel backend; however I am not certain what is needed for
foreach(i = 1:???){
some code
}
since, I do not know ahead of time when the repeat loop will stop.
回答1:
You can iterate on only the number of cores you have. Then you can detect with memory-mapping when one core has found the solution and then stop the others.
library(bigstatsr)
library(foreach)
ncores <- nb_cores() # or parallel::detectCores() - 1
fbm <- FBM(1, 1, init = 0) # shared memory
p <- 0.9999
HAC.sim <- function() runif(1)
cl <- parallel::makeCluster(ncores)
doParallel::registerDoParallel(cl)
res <- foreach(i = 1:ncores, .combine = 'c') %dopar% {
repeat {
if (fbm[1] != 0) return(NULL)
R <- HAC.sim()
if (R >= p) {
fbm[1] <- 1 # tell the others to stop
return(R)
}
}
}
parallel::stopCluster(cl)
res
来源:https://stackoverflow.com/questions/47019114/how-can-foreach-in-the-parallel-r-package-be-handled-a-repeat-loop-with-breaks