Initializing state on dask-distributed workers

问题

I am trying to do something like

resource = MyResource()
def fn(x):
   something = dosemthing(x, resource)
   return something

client = Client()
results = client.map(fn, data)

The issue is that resource is not serializable and is expensive to construct. Therefore I would like to construct it once on each worker and be available to be used by fn.

How do I do this? Or is there some other way to make resource available on all workers?

回答1:

You can always construct a lazy resource, something like

class GiveAResource():
    resource = [None]
    def get_resource(self):
        if self.resource[0] is None:
            self.resource[0] = MyResource()
        return self.resource[0]

An instance of this will serialise between processes fine, so you can include it as an input to any function to be executed on workers, and then calling .get_resource() on it will get your local expensive resource (which will get remade on any worker which appears later on).

This class would be best defined in a module rather than dynamic code.

There is no locking here, so if several threads ask for the resource at the same time when it has not been needed so far, you will get redundant work.

来源：https://stackoverflow.com/questions/54469698/initializing-state-on-dask-distributed-workers

标签

python

python-3.x

multiprocessing

dask

dask-distributed

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!