Setting up Dask worker with variable

半城伤御伤魂 提交于 2019-12-24 20:13:56

问题


I would like to distribute a larger object (or load from disk) when a worker loads and put it into a global variable (such as calib_data). Does that work with dask workers?


回答1:


Seems like the client method register_worker_callbacks can do what you want in this case. You will still need somewhere to put your variable, since in python there is no truly global scope. That somewhere could be any attribute of an imported module, for example, which, then, any worker would have access to. You could also add it as an attribute of the worker instance itself, but I see no obvious reason to want to do that.

One way which works, hijacking a randomly picked builtin module; but I do not particularly recommend this (see below)

def attach_var(name, value):
    import re
    re.__setattr__(name, value)

client.run(attach_var, 'x', 1)

def use_var():
    # any function running on a worker can do this, via delayed or
    # whatever method you pass with
    import re
    return re.x

client.run(use_var)

Before going ahead, though, have you already considered delayed(calib_data) or scatter, which will copy your variable to where its needed, e.g.,

futures = client.scatter(calib_data, broadcast=True)

or indeed loading the data in the workers using ordinary delayed semantics

dcalib = dask.delayed(load_calib_data)()
work = dask.delayed(process_stuff)(dataset1, dcalib)


来源:https://stackoverflow.com/questions/54432928/setting-up-dask-worker-with-variable

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!