问题
How to calculate slope of each columns' rolling(window=60) value, stepped by 5?
I'd like to calculate every 5 minutes' value, and I don't need every record's results.
Here's sample dataframe and results:
df
Time A ... N
2016-01-01 00:00 1.2 ... 4.2
2016-01-01 00:01 1.2 ... 4.0
2016-01-01 00:02 1.2 ... 4.5
2016-01-01 00:03 1.5 ... 4.2
2016-01-01 00:04 1.1 ... 4.6
2016-01-01 00:05 1.6 ... 4.1
2016-01-01 00:06 1.7 ... 4.3
2016-01-01 00:07 1.8 ... 4.5
2016-01-01 00:08 1.1 ... 4.1
2016-01-01 00:09 1.5 ... 4.1
2016-01-01 00:10 1.6 ... 4.1
....
result
Time A ... N
2016-01-01 00:04 xxx ... xxx
2016-01-01 00:09 xxx ... xxx
2016-01-01 00:14 xxx ... xxx
...
Can df.rolling function be applied to this problem?
It's fine if NaN is in the window, meaning subset could be less than 60.
回答1:
It seems that what you want is rolling with a specific step size.
However, according to the documentation of pandas, step size is currently not supported in rolling
.
If the data size is not too large, just perform rolling on all data and select the results using indexing.
Here's a sample dataset. For simplicity, the time column is represented using integers.
data = pd.DataFrame(np.random.rand(500, 1) * 10, columns=['a'])
a
0 8.714074
1 0.985467
2 9.101299
3 4.598044
4 4.193559
.. ...
495 9.736984
496 2.447377
497 5.209420
498 2.698441
499 3.438271
Then, roll and calculate slopes,
def calc_slope(x):
slope = np.polyfit(range(len(x)), x, 1)[0]
return slope
# set min_periods=2 to allow subsets less than 60.
# use [4::5] to select the results you need.
result = data.rolling(60, min_periods=2).apply(calc_slope)[4::5]
The result will be,
a
4 -0.542845
9 0.084953
14 0.155297
19 -0.048813
24 -0.011947
.. ...
479 -0.004792
484 -0.003714
489 0.022448
494 0.037301
499 0.027189
Or, you can refer to this post. The first answer provides a numpy way to achieve this: step size in pandas.DataFrame.rolling
回答2:
try this
windows = df.groupby("Time")["A"].rolling(60)
df[out] = windows.apply(lambda x: np.polyfit(range(60), x, 1)[0], raw=True).values
回答3:
You could use pandas Resample. Note that to use this , you need an index with time value
df.index = pd.to_datetime(df.Time)
print df
result = df.resample('5Min').bfill()
print result
Time A N
Time
2016-01-01 00:00:00 2016-01-01 00:00 1.2 4.2
2016-01-01 00:01:00 2016-01-01 00:01 1.2 4.0
2016-01-01 00:02:00 2016-01-01 00:02 1.2 4.5
2016-01-01 00:03:00 2016-01-01 00:03 1.5 4.2
2016-01-01 00:04:00 2016-01-01 00:04 1.1 4.6
2016-01-01 00:05:00 2016-01-01 00:05 1.6 4.1
2016-01-01 00:06:00 2016-01-01 00:06 1.7 4.3
2016-01-01 00:07:00 2016-01-01 00:07 1.8 4.5
2016-01-01 00:08:00 2016-01-01 00:08 1.1 4.1
2016-01-01 00:09:00 2016-01-01 00:09 1.5 4.1
2016-01-01 00:10:00 2016-01-01 00:10 1.6 4.1
2016-01-01 00:15:00 2016-01-01 00:15 1.6 4.1
Time A N
Output
Time
2016-01-01 00:00:00 2016-01-01 00:00 1.2 4.2
2016-01-01 00:05:00 2016-01-01 00:05 1.6 4.1
2016-01-01 00:10:00 2016-01-01 00:10 1.6 4.1
2016-01-01 00:15:00 2016-01-01 00:15 1.6 4.1
回答4:
hi sorry to pull this old question up. but I cannot follow the results :S
def calc_slope(x):
slope = np.polyfit(range(len(x)), x, 1)[0]
return slope
# set min_periods=2 to allow subsets less than 60.
# use [4::5] to select the results you need.
data['slope'] = data.rolling(3, min_periods=3).apply(calc_slope)
print(data.to_string())
with a result of:
a slope
0 6.902663 NaN
1 2.257267 NaN
2 0.172393 -3.365135
3 9.642700 3.692717
4 1.221879 0.524743
5 1.634674 -4.004013
6 8.274599 3.526360
7 9.800035 4.082681
8 4.577713 -1.848443
9 1.368656 -4.215690
10 9.377983 2.400135
11 9.795934 4.213639
12 3.045406 -3.166288
13 6.063934 -1.866000
14 8.202430 2.578512
any ideas?
thx
回答5:
I use:
df['slope_I'] = df['I'].rolling('600s').apply(lambda x: (x[-1]-x[0])/600)
where the slope is something with 1/seconds units.
Probably the first 600s of the result will be empty, you should fill it with zeros, or with the mean. The first number in the slope column will be the slope of the line that goes from the first row inside the window to the last, and so on during the rolling.
Best regards.
来源:https://stackoverflow.com/questions/42138357/pandas-rolling-slope-calculation