how to store worker-local variables in dask/distributed

不羁的心 提交于 2021-02-07 13:12:23

问题


Using dask 0.15.0, distributed 1.17.1.

I want to memoize some things per worker, like a client to access google cloud storage, because instantiating it is expensive. I'd rather store this in some kind of worker attribute. What is the canonical way to do this? Or are globals the way to go?


回答1:


On the worker

You can get access to the local worker with the get_worker function. A slightly cleaner thing than globals would be to attach state to the worker:

from dask.distributed import get_worker

def my_function(...):
    worker = get_worker()
    worker.my_personal_state = ...

future = client.submit(my_function, ...)

We should probably add a generic namespace variable on workers to serve as a general place for information like this, but haven't yet.

As Globals

That being said though, for things like connections to external services globals aren't entirely evil. Many systems like Tornado use global singletons.

If you care about thread safety

Note that workers are often multi-threaded. If your connection object isn't threadsafe then you may need to cache a different object per-thread. For this I recommend using a threading.local object. Dask uses one at

from distributed.worker import thread_state



回答2:


Dask Actors

For simpler use cases, other solutions may be preferable; however, its worth considering Actors. Actors are currently an experimental feature in Dask which enables stateful computations.

Dask Actors



来源:https://stackoverflow.com/questions/45008852/how-to-store-worker-local-variables-in-dask-distributed

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!