发表新帖

发表新帖

Why does Dask perform so slower while multiprocessing perform so much faster?

后端未结

关注

 3  1832

情书的邮戳 2020-12-11 11:44

To get a better understanding about parallel, I am comparing a set of different pieces of code.

Here is the basic one (code_piece_1).

for loop

3条回答

無奈伤痛 (楼主)

2020-12-11 12:35

In your example, dask is slower than python multiprocessing, because you don't specify the scheduler, so dask uses the multithreading backend, which is the default. As mdurant has pointed out, your code does not release the GIL, therefore multithreading cannot execute the task graph in parallel.

Have a look here for a good overview over the topic: https://docs.dask.org/en/stable/scheduler-overview.html

For your code, you could switch to the multiprocessing backend by calling: .compute(scheduler='processes').

If you use the multiprocessing backend, all communication between processes still needs to pass through the main process. You therefore might also want to check out the distributed scheduler, where worker processes can directly communicate with each other, which is beneficial especially for complex task graphs. Also, the distributed scheduler supports work-stealing to balance work between processes and has a webinterface providing some diagnostic information about running tasks. It often makes sense to use the distributed scheduler rather than the multirpocessing scheduler even if you only want to compute on a local machine.

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题