Pandas - Rolling slope calculation

问题

How to calculate slope of each columns' rolling(window=60) value, stepped by 5?

I'd like to calculate every 5 minutes' value, and I don't need every record's results.

Here's sample dataframe and results:

df
Time                A    ...      N
2016-01-01 00:00  1.2    ...    4.2
2016-01-01 00:01  1.2    ...    4.0
2016-01-01 00:02  1.2    ...    4.5
2016-01-01 00:03  1.5    ...    4.2
2016-01-01 00:04  1.1    ...    4.6
2016-01-01 00:05  1.6    ...    4.1
2016-01-01 00:06  1.7    ...    4.3
2016-01-01 00:07  1.8    ...    4.5
2016-01-01 00:08  1.1    ...    4.1
2016-01-01 00:09  1.5    ...    4.1
2016-01-01 00:10  1.6    ...    4.1
....

result
Time                A    ...      N
2016-01-01 00:04  xxx    ...    xxx
2016-01-01 00:09  xxx    ...    xxx
2016-01-01 00:14  xxx    ...    xxx
...

Can df.rolling function be applied to this problem?

It's fine if NaN is in the window, meaning subset could be less than 60.

回答1:

It seems that what you want is rolling with a specific step size. However, according to the documentation of pandas, step size is currently not supported in rolling.

If the data size is not too large, just perform rolling on all data and select the results using indexing.

Here's a sample dataset. For simplicity, the time column is represented using integers.

data = pd.DataFrame(np.random.rand(500, 1) * 10, columns=['a'])

            a
0    8.714074
1    0.985467
2    9.101299
3    4.598044
4    4.193559
..        ...
495  9.736984
496  2.447377
497  5.209420
498  2.698441
499  3.438271

Then, roll and calculate slopes,

def calc_slope(x):
    slope = np.polyfit(range(len(x)), x, 1)[0]
    return slope

# set min_periods=2 to allow subsets less than 60.
# use [4::5] to select the results you need.
result = data.rolling(60, min_periods=2).apply(calc_slope)[4::5]

The result will be,

            a
4   -0.542845
9    0.084953
14   0.155297
19  -0.048813
24  -0.011947
..        ...
479 -0.004792
484 -0.003714
489  0.022448
494  0.037301
499  0.027189

Or, you can refer to this post. The first answer provides a numpy way to achieve this: step size in pandas.DataFrame.rolling

回答2:

try this

windows = df.groupby("Time")["A"].rolling(60)
df[out] = windows.apply(lambda x: np.polyfit(range(60), x, 1)[0], raw=True).values

回答3:

You could use pandas Resample. Note that to use this , you need an index with time value

df.index = pd.to_datetime(df.Time)
print df
result = df.resample('5Min').bfill()
print result
                                 Time    A    N
Time                                           
2016-01-01 00:00:00  2016-01-01 00:00  1.2  4.2
2016-01-01 00:01:00  2016-01-01 00:01  1.2  4.0
2016-01-01 00:02:00  2016-01-01 00:02  1.2  4.5
2016-01-01 00:03:00  2016-01-01 00:03  1.5  4.2
2016-01-01 00:04:00  2016-01-01 00:04  1.1  4.6
2016-01-01 00:05:00  2016-01-01 00:05  1.6  4.1
2016-01-01 00:06:00  2016-01-01 00:06  1.7  4.3
2016-01-01 00:07:00  2016-01-01 00:07  1.8  4.5
2016-01-01 00:08:00  2016-01-01 00:08  1.1  4.1
2016-01-01 00:09:00  2016-01-01 00:09  1.5  4.1
2016-01-01 00:10:00  2016-01-01 00:10  1.6  4.1
2016-01-01 00:15:00  2016-01-01 00:15  1.6  4.1
                                 Time    A    N

Output

Time                                           
2016-01-01 00:00:00  2016-01-01 00:00  1.2  4.2
2016-01-01 00:05:00  2016-01-01 00:05  1.6  4.1
2016-01-01 00:10:00  2016-01-01 00:10  1.6  4.1
2016-01-01 00:15:00  2016-01-01 00:15  1.6  4.1

回答4:

hi sorry to pull this old question up. but I cannot follow the results :S

def calc_slope(x):
    slope = np.polyfit(range(len(x)), x, 1)[0]
    return slope

# set min_periods=2 to allow subsets less than 60.
# use [4::5] to select the results you need.
data['slope']  = data.rolling(3, min_periods=3).apply(calc_slope)

print(data.to_string())

with a result of:

           a     slope
0   6.902663       NaN
1   2.257267       NaN
2   0.172393 -3.365135
3   9.642700  3.692717
4   1.221879  0.524743
5   1.634674 -4.004013
6   8.274599  3.526360
7   9.800035  4.082681
8   4.577713 -1.848443
9   1.368656 -4.215690
10  9.377983  2.400135
11  9.795934  4.213639
12  3.045406 -3.166288
13  6.063934 -1.866000
14  8.202430  2.578512

any ideas?

thx

回答5:

I use:

    df['slope_I'] = df['I'].rolling('600s').apply(lambda x: (x[-1]-x[0])/600)

where the slope is something with 1/seconds units.

Probably the first 600s of the result will be empty, you should fill it with zeros, or with the mean. The first number in the slope column will be the slope of the line that goes from the first row inside the window to the last, and so on during the rolling.

Best regards.

来源：https://stackoverflow.com/questions/42138357/pandas-rolling-slope-calculation

标签

python

pandas

regression