R mclapply vs foreach

江枫思渺然 提交于 2021-02-06 09:53:29

问题


I use mclapply for all my "embarassingly parallel" computations. I find it clean and easy to use, and when arguments mc.cores = 1 and mc.preschedule = TRUE I can insert browser() in the function inside mclapply and debug line by line just like in regular R. This is a huge help in getting code to production quicker.

What does foreach offer that mclapply does not? Is there a reason I should consider writing foreach code going forward?

If I understand correctly, both can use the multicore approach to parallel computations (permitting forking) which I like to use for performance reasons.

I have seen foreach being used in various packages, and have read the basics of it, but frankly I don't find it as easy to use. I also am unable to figure out how to get the browser() to work in foreach function calls. (yes I have read this thread browser mode with foreach %dopar% but didn't help me to get the browser to work right).


回答1:


The problem is almost the same as described here: Understanding the differences between mclapply and parLapply in R .

The mclapply is creating clones of the master process for each worker processes (threads/cores) at the point that mclapply is called, reproducibility is guaranteed. Unfortunately, that isn't possible on Windows where in contrast to multicore there is always used the multisession parallelism by foreach or parLapply.

When using parLapply or foreach with %dopar%, you generally have to perform the following additional steps: Create a PSOCK cluster, Register the cluster if desired, Load necessary packages on the cluster workers, Export necessary data and functions to the global environment of the cluster workers.

That is why foreach has parameters like .packages and .export which enable us to distribute everything needed across sessions.

future package provided details of differences between mulicore and multisession processing https://cran.r-project.org/web/packages/future/vignettes/future-1-overview.html




回答2:


As Steve Weston (author of foreach) says here, using foreach with doParallel as backend you can initialize workers. This can be helpful for setting up a database connection more efficiently once per worker instead of once per task.



来源:https://stackoverflow.com/questions/44806048/r-mclapply-vs-foreach

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!