Shifting all rows in dask dataframe

余生长醉 提交于 2019-12-07 13:33:39

问题


In Pandas, there is a method DataFrame.shift(n) which shifts the contents of an array by n rows, relative to the index, similarly to np.roll(a, n). I can't seem to find a way to get a similar behaviour working with Dask. I realise things like row-shifts may be difficult to manage with Dask's chunked system, but I don't know of a better way to compare each row with the subsequent one.

What I'd like to be able to do is this:

import numpy as np
import pandas as pd
import dask.DataFrame as dd

with pd.HDFStore(path) as store:
    data = dd.from_hdf(store, 'sim')[col1]
    shifted = data.shift(1)

    idx = data.apply(np.sign) != shifted.apply(np.sign)

in order to create a boolean series indicating the locations of sign changes in the data. (I am aware that method would also catch changes from a signed value to zero) I would then use the boolean series to index a different Dask dataframe for plotting.


回答1:


Rolling functions

Currently dask.dataframe does not implement the shift operation. It could though if you raise an issue. In principle this is not so dissimilar from rolling operations that dask.dataframe does support, like rolling_mean, rolling_sum, etc..

Actually, if you were to create a Pandas function that adheres to the same API as these pandas.rolling_foo functions then you can use the dask.dataframe.rolling.wrap_rolling function to turn your pandas style rolling function into a dask.dataframe rolling function.

dask.dataframe.rolling_sum = wrap_rolling(pandas.rolling_sum)



回答2:


The following code might help to shift down the series.

s = dd_df['column'].rolling(window=2).sum() - dd_df['column']

Edit (03/09/2019):

When you are rolling and finding the sum, for a particular row,

result[i] = row[i-1] + row[i]

Then by subtracting the old value of the column from the result, you are doing the following operation:

final_row[i] = result[i] - row[i]

Which equals:

final_row[i] = row[i-1] + row[i] - row[i]

Which ultimately results in the whole column getting shifted down once.

Tip:

If you want to shift it down multiple rows, you should actually execute the whole operation again that many times with the same window.



来源:https://stackoverflow.com/questions/34225275/shifting-all-rows-in-dask-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!