dask | 易学教程

Actors and dask-workers

阅读更多关于 Actors and dask-workers

问题 client = Client('127.0.0.1:8786',direct_to_workers=True) future1 = client.submit(Counter, workers= 'ninja',actor=True) counter1 = future1.result() print(counter1) All is well but what if the client gets restarted? How do I get the actor back from the worker called ninja? 回答1: There is no user-facing way to do this as of 2019-03-06 I recommend raising a feature request issue 来源： https://stackoverflow.com/questions/54918699/actors-and-dask-workers

Why do my Dask Futures get stuck in 'pending' and never finish?

阅读更多关于 Why do my Dask Futures get stuck in 'pending' and never finish?

问题 I have some long-running code (~5-10 minute processing) that I'm trying to run as a Dask Future . It's a series of several discrete steps that I can either run as one function: result : Future = client.submit(my_function, arg1, arg2) Or I can split up into intermediate steps: # compose the result from the same intermediate results but with Futures intermediate1 = client.submit(my_function1, arg1) intermediate2 = client.submit(my_function2, arg1, arg2) intermediate3 = client.submit(my

Why do my Dask Futures get stuck in 'pending' and never finish?

阅读更多关于 Why do my Dask Futures get stuck in 'pending' and never finish?

Converting a Dask column into new Dask column of type datetime

阅读更多关于 Converting a Dask column into new Dask column of type datetime

问题 I have an unparsed column in a dask dataframe (df) that I am using pandas to convert to datetime and put into a new column in the dask dataframe. However it breaks as column assignment doesn't support type DatetimeIndex. df['New Column'] = pd.to_datetime(np.array(df.index.values), format='%Y/%m/%d %H:%M') 回答1: this should work import dask.dataframe as dd # note df is a dask dataframe df['New Column'] = dd.to_datetime(df.index, format='%Y/%m/%d %H:%M') 来源： https://stackoverflow.com/questions

Parallelize loop over numpy rows

阅读更多关于 Parallelize loop over numpy rows

问题 I need to apply the same function onto every row in a numpy array and store the result again in a numpy array. # states will contain results of function applied to a row in array states = np.empty_like(array) for i, ar in enumerate(array): states[i] = function(ar, *args) # do some other stuff on states function does some non trivial filtering of my data and returns an array when the conditions are True and when they are False. function can either be pure python or cython compiled. The

Parallelize loop over numpy rows

阅读更多关于 Parallelize loop over numpy rows

Parallelize loop over numpy rows

阅读更多关于 Parallelize loop over numpy rows

how to store worker-local variables in dask/distributed

阅读更多关于 how to store worker-local variables in dask/distributed

问题 Using dask 0.15.0, distributed 1.17.1. I want to memoize some things per worker, like a client to access google cloud storage, because instantiating it is expensive. I'd rather store this in some kind of worker attribute. What is the canonical way to do this? Or are globals the way to go? 回答1: On the worker You can get access to the local worker with the get_worker function. A slightly cleaner thing than globals would be to attach state to the worker: from dask.distributed import get_worker

how to store worker-local variables in dask/distributed

阅读更多关于 how to store worker-local variables in dask/distributed

dask performance apply along axis

阅读更多关于 dask performance apply along axis

问题 I am trying to compute the linear trend over time on a large high resolution ocean model dataset using dask. I have followed this example (Applying a function along an axis of a dask array) and found the syntax of apply_along_axis easier. I am currently using dask.array.apply_along_axis to wrap a numpy function on 1 dimensional arrays and then package the resulting dask array into an xarray Dataarray . Using top -u <username> suggest that the computation is not executed in parallel (~100% cpu