dask

Actors and dask-workers

让人想犯罪 __ 提交于 2021-02-09 08:28:47
问题 client = Client('127.0.0.1:8786',direct_to_workers=True) future1 = client.submit(Counter, workers= 'ninja',actor=True) counter1 = future1.result() print(counter1) All is well but what if the client gets restarted? How do I get the actor back from the worker called ninja? 回答1: There is no user-facing way to do this as of 2019-03-06 I recommend raising a feature request issue 来源: https://stackoverflow.com/questions/54918699/actors-and-dask-workers

Why do my Dask Futures get stuck in 'pending' and never finish?

假装没事ソ 提交于 2021-02-08 08:25:25
问题 I have some long-running code (~5-10 minute processing) that I'm trying to run as a Dask Future . It's a series of several discrete steps that I can either run as one function: result : Future = client.submit(my_function, arg1, arg2) Or I can split up into intermediate steps: # compose the result from the same intermediate results but with Futures intermediate1 = client.submit(my_function1, arg1) intermediate2 = client.submit(my_function2, arg1, arg2) intermediate3 = client.submit(my

Why do my Dask Futures get stuck in 'pending' and never finish?

大城市里の小女人 提交于 2021-02-08 08:25:04
问题 I have some long-running code (~5-10 minute processing) that I'm trying to run as a Dask Future . It's a series of several discrete steps that I can either run as one function: result : Future = client.submit(my_function, arg1, arg2) Or I can split up into intermediate steps: # compose the result from the same intermediate results but with Futures intermediate1 = client.submit(my_function1, arg1) intermediate2 = client.submit(my_function2, arg1, arg2) intermediate3 = client.submit(my

Converting a Dask column into new Dask column of type datetime

这一生的挚爱 提交于 2021-02-08 04:29:10
问题 I have an unparsed column in a dask dataframe (df) that I am using pandas to convert to datetime and put into a new column in the dask dataframe. However it breaks as column assignment doesn't support type DatetimeIndex. df['New Column'] = pd.to_datetime(np.array(df.index.values), format='%Y/%m/%d %H:%M') 回答1: this should work import dask.dataframe as dd # note df is a dask dataframe df['New Column'] = dd.to_datetime(df.index, format='%Y/%m/%d %H:%M') 来源: https://stackoverflow.com/questions

Parallelize loop over numpy rows

放肆的年华 提交于 2021-02-07 14:36:55
问题 I need to apply the same function onto every row in a numpy array and store the result again in a numpy array. # states will contain results of function applied to a row in array states = np.empty_like(array) for i, ar in enumerate(array): states[i] = function(ar, *args) # do some other stuff on states function does some non trivial filtering of my data and returns an array when the conditions are True and when they are False. function can either be pure python or cython compiled. The

Parallelize loop over numpy rows

对着背影说爱祢 提交于 2021-02-07 14:35:11
问题 I need to apply the same function onto every row in a numpy array and store the result again in a numpy array. # states will contain results of function applied to a row in array states = np.empty_like(array) for i, ar in enumerate(array): states[i] = function(ar, *args) # do some other stuff on states function does some non trivial filtering of my data and returns an array when the conditions are True and when they are False. function can either be pure python or cython compiled. The

Parallelize loop over numpy rows

大憨熊 提交于 2021-02-07 14:34:21
问题 I need to apply the same function onto every row in a numpy array and store the result again in a numpy array. # states will contain results of function applied to a row in array states = np.empty_like(array) for i, ar in enumerate(array): states[i] = function(ar, *args) # do some other stuff on states function does some non trivial filtering of my data and returns an array when the conditions are True and when they are False. function can either be pure python or cython compiled. The

how to store worker-local variables in dask/distributed

不羁的心 提交于 2021-02-07 13:12:23
问题 Using dask 0.15.0, distributed 1.17.1. I want to memoize some things per worker, like a client to access google cloud storage, because instantiating it is expensive. I'd rather store this in some kind of worker attribute. What is the canonical way to do this? Or are globals the way to go? 回答1: On the worker You can get access to the local worker with the get_worker function. A slightly cleaner thing than globals would be to attach state to the worker: from dask.distributed import get_worker

how to store worker-local variables in dask/distributed

谁说我不能喝 提交于 2021-02-07 13:01:47
问题 Using dask 0.15.0, distributed 1.17.1. I want to memoize some things per worker, like a client to access google cloud storage, because instantiating it is expensive. I'd rather store this in some kind of worker attribute. What is the canonical way to do this? Or are globals the way to go? 回答1: On the worker You can get access to the local worker with the get_worker function. A slightly cleaner thing than globals would be to attach state to the worker: from dask.distributed import get_worker

dask performance apply along axis

廉价感情. 提交于 2021-02-07 09:35:25
问题 I am trying to compute the linear trend over time on a large high resolution ocean model dataset using dask. I have followed this example (Applying a function along an axis of a dask array) and found the syntax of apply_along_axis easier. I am currently using dask.array.apply_along_axis to wrap a numpy function on 1 dimensional arrays and then package the resulting dask array into an xarray Dataarray . Using top -u <username> suggest that the computation is not executed in parallel (~100% cpu