parallelizing heterogenous tasks in R: foreach, doMC, doParallel

霸气de小男生 提交于 2020-05-25 08:27:39

问题


Here's what's been puzzling me:

When you schedule a sequence of tasks that are homogenous in terms of content but heterogenous in terms of processing time (not known ex ante) using foreach, how exactly does foreach process these embarrassingly parallel tasks sequentially?

For instance, I registered 4 threads registerDoMC(cores=4) and I have 10 tasks and the 4th and the 5th each turned out to be longer than all others combine. Then the first batch is obviously the 1st, 2nd, 3rd and 4th. When the 1st, 2nd and 3rd are done, how exactly does foreach assign other tasks sequentially? Is that random (which seems so from my observation)? And what's a good practice to speed up if it turns out some tasks take way longer time to process?

I am sorry for not providing concrete examples since my actual projects/codes are much more involved...

Any experiences/guidance/pointers are very much appreciated!


回答1:


The doMC package is a wrapper around mclapply, and by default mclapply preschedules tasks, which means it splits the tasks into groups, or chunks. The twist is that it preschedules those tasks round-robin. Thus, if you have 10 tasks and 4 workers, the tasks will be assigned as follows:

  • worker 1: tasks 1, 5, 9
  • worker 2: tasks 2, 6, 10
  • worker 3: tasks 3, 7
  • worker 4: tasks 4, 8

If you're lucky, this will give reasonable performance even if the tasks have very different lengths, but you can disable prescheduling in doMC as follows:

opts <- list(preschedule=FALSE)
results <- foreach(i=1:10, .options.multicore=opts) %dopar% {
    # ...
}

This will cause doMC to call mclapply with the mc.preschedule=FALSE option so that tasks are assigned to workers as they complete their previous task which is naturally load balancing.



来源:https://stackoverflow.com/questions/40578784/parallelizing-heterogenous-tasks-in-r-foreach-domc-doparallel

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!