Override dask scheduler to concurrently load data on multiple workers
问题 I want to run graphs/futures on my distributed cluster which all have a 'load data' root task and then a bunch of training tasks that run on that data. A simplified version would look like this: from dask.distributed import Client client = Client(scheduler_ip) load_data_future = client.submit(load_data_func, 'path/to/data/') train_task_futures = [client.submit(train_func, load_data_future, params) for params in train_param_set] Running this as above the scheduler gets one worker to read the