How to enable proper work stealing in dask.distributed when using task restrictions / worker resources?

余生长醉 提交于 2020-05-30 04:08:42

问题


Context

I'm using dask.distributed to parallelise computations across machines. I therefore have dask-workers running on the different machines which connect to a dask-scheduler, to which I can then submit my custom graphs to together with the required keys.

Due to network mount restrictions, my input data (and output storage) is only available to a subset of the machines ('i/o-hosts'). I tried to deal with this in two ways:

  1. all tasks involved in i/o operations are restricted to i/o-hosts (they can run only on workers which run on machines that have access to the data) and non i/o tasks are restricted to non-i/o-hosts
  2. all tasks involved in i/o operations are bound to workers providing the resource 'io' (the i/o-hosts) and non i/o tasks are bound to workers on non-i/o-hosts which provide the resource 'compute'.

The idea behind not allowing non i/o tasks to run on i/o-hosts is to make sure their workers are available for i/o tasks.

Problem

Both approaches work as expected in the sense that they restrict i/o-tasks to the right workers. However I noticed that when using any of the two approaches, only very few workers accumulate a lot of tasks while most other workers remain idle.

After reading up on how tasks are distributed among the workers I found that work stealing seems to be intentionally disabled for restricted tasks (http://distributed.readthedocs.io/en/latest/work-stealing.html). This seems to also apply for the resources framework.

Question

Is there a good way to combine restrictions on tasks with work stealing?

来源:https://stackoverflow.com/questions/46180830/how-to-enable-proper-work-stealing-in-dask-distributed-when-using-task-restricti

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!