dask, joblib, ipyparallel and other schedulers for embarrassingly parallel problems

强颜欢笑 提交于 2019-12-24 13:51:41

问题


This is a more general question about how to run "embarassingly paralllel" problems with python "schedulers" in a science environment.

I have a code that is a Python/Cython/C hybrid (for this example I'm using github.com/tardis-sn/tardis .. but I have more such problems for other codes) that is internally OpenMP parallalized. It provides a single function that takes a parameter dictionary and evaluates to an object within a few hundred seconds running on ~8 cores (result=fun(paramset, calibdata) where paramset is a dict and result is an object (collection of pandas and numpy arrays basically) and calibdata is a pre-loaded pandas dataframe/object). It logs using the standard Python logging function.

I would like a python framework that can easily evaluate ~10-100k parameter sets using fun on a SLURM/TORQUE/... cluster environment. Ideally, this framework would automatically spawn workers (given availability with a few cores each) and distribute the parameter sets between the workers (different parameter sets take different amount of time). It would be nice to see the state (in_queue, running, finished, failed) for each of the parameter-sets as well as logs (if it failed or finished or is running).

It would be nice if it keeps track of what is finished and what needs to be done so that I can restart this if my scheduler tasks fails. It would be nice if this seemlessly integrates into jupyter notebook and runs locally for testing.

I have tried dask but that does not seem to queue the tasks but runs them all-at-once with client.map(fun, [list of parameter sets]). Maybe there are better tools or maybe this is a very niche problem. It's also unclear to me what the difference between dask, joblib and ipyparallel is (having quickly tried all three of them at various stages).

Happy to give additional info if things are not clear.

UPDATE: so dask seems to provide some functionality of what I require - but dealing with an OpenMP parallelized code in addition to dask is not straightforward - see issue https://github.com/dask/dask-jobqueue/issues/181

来源:https://stackoverflow.com/questions/54469195/dask-joblib-ipyparallel-and-other-schedulers-for-embarrassingly-parallel-probl

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!