问题
I am considering using a closure with the current state, to compute the rolling window (which in my case is of width 2), to answer my own question, which I have recently posed. Something on the lines of:
def test(init_value):
def my_fcn(x,y):
nonlocal init_value
actual_value = (x + y) * init_value
init_value = actual_value
return init_value
return my_fcn
where my_fcn is a dummy function used for testing. Therefore the function might be initialised thorugh actual_fcn = test(0);
where we assume the initial value is zero, for example. Finally one could use the function through ddf.apply (where ddf is the actual dask dataframe).
Finally the question: this would work, if the order of the computations is preserved, otherwise everything would be scrambled. I have not tested it, since -even if it passes- I cannot be 100% sure it will always preserve the order. So, question is:
Does dask dataframe's apply method preserve rows order?
Any other ideas? Any help highly appreciated.
回答1:
Apparently yes. I am using dask 1.0.0.
The following code:
import numpy as np
import pandas as pd
import dask.dataframe as dd
number_of_components = 30
df = pd.DataFrame(np.random.randint(0,number_of_components,size=(number_of_components, 4)), columns=list('ABCD'))
my_data_frame = dd.from_pandas(df, npartitions = 1 )
def sumPrevious( previousState ) :
def getValue(row):
nonlocal previousState
something = row['A'] - previousState
previousState = row['A']
return something
return getValue
given_func = sumPrevious(1)
out = my_data_frame.apply(given_func, axis = 1 , meta = float).compute()
behaves as expected. There is a big caveat: if the previous state is provided by reference (i.e.: it is some object of some class) then the user should be careful in using equality inside the nested function to update the previous state: since it will have side effects, if the state is passed by reference.
Rigorously, this example does not prove that order is preserved under any circumstances; so I would still be interested whether I can rely on this assumption.
来源:https://stackoverflow.com/questions/55495906/does-dask-dataframe-apply-preserve-rows-order