How to efficiently submit tasks with large arguments in Dask distributed?
I want to submit functions with Dask that have large (gigabyte scale) arguments. What is the best way to do this? I want to run this function many times with different (small) parameters. Example (bad) This uses the concurrent.futures interface. We could use the dask.delayed interface just as easily. x = np.random.random(size=100000000) # 800MB array params = list(range(100)) # 100 small parameters def f(x, param): pass from dask.distributed import Client c = Client() futures = [c.submit(f, x, param) for param in params] But this is slower than I would expect or results in memory errors. OK,