问题
I have a time series that I am resampling to 5s windows like such:
INDEX size price
2018-05-07 21:53:13.731 0.365127 9391.800000
2018-05-07 21:53:16.201 0.666127 9391.800000
2018-05-07 21:53:18.038 0.143104 9391.800000
2018-05-07 21:53:18.243 0.025643 9391.800000
2018-05-07 21:53:18.265 0.640484 9391.800000
2018-05-07 21:53:18.906 -0.100000 9391.793421
2018-05-07 21:53:19.829 0.559516 9391.800000
2018-05-07 21:53:19.846 0.100000 9391.800000
2018-05-07 21:53:19.870 0.006560 9391.800000
2018-05-07 21:53:20.734 0.666076 9391.800000
2018-05-07 21:53:20.775 0.666076 9391.800000
2018-05-07 21:53:28.607 0.100000 9391.800000
2018-05-07 21:53:28.610 0.041991 9391.800000
2018-05-07 21:53:29.283 -0.053518 9391.793421
2018-05-07 21:53:47.322 -0.046302 9391.793421
2018-05-07 21:53:49.182 0.100000 9391.800000
def tick_features(x):
volume = np.abs(x['size']).sum()
num_trades = x['size'].count()
return pd.Series([volume,num_trades], index=['volume','num_trades'])
tick = tick.groupby(pd.Grouper(freq='5S')).apply(tick_features)
How would I go about getting the first and last elements of each 5S via pd.Grouper()
and .apply()
?
I could do similar things with .resample().agg()
and {'price':'first'}
but for other reasons I'd like to do it via pd.Grouper()
if possible.
回答1:
I suggest use DataFrameGroupBy.agg with list of tuples and functions first
and last
:
tick_features = [('volume', lambda x: x.abs().sum()),
('num_trades', 'count'),
('first_trade', 'first'),
('last_trade', 'last')]
tick = tick.groupby(pd.Grouper(freq='5S'))['size'].agg(tick_features)
print (tick)
volume num_trades first_trade last_trade
INDEX
2018-05-07 21:53:10 0.365127 1 0.365127 0.365127
2018-05-07 21:53:15 2.241434 8 0.666127 0.006560
2018-05-07 21:53:20 1.332152 2 0.666076 0.666076
2018-05-07 21:53:25 0.195509 3 0.100000 -0.053518
2018-05-07 21:53:30 0.000000 0 NaN NaN
2018-05-07 21:53:35 0.000000 0 NaN NaN
2018-05-07 21:53:40 0.000000 0 NaN NaN
2018-05-07 21:53:45 0.146302 2 -0.046302 0.100000
apply
solution is possible, but need if-else
statement:
def tick_features(x):
volume = np.abs(x['size']).sum()
num_trades = x['size'].count()
if not x.empty:
f = x['size'].iloc[0]
l = x['size'].iloc[-1]
else:
f = np.nan
l = np.nan
return pd.Series([volume,num_trades, f, l],
index=['volume','num_trades', 'first_trade', 'last_trade'])
tick = tick.groupby(pd.Grouper(freq='5S')).apply(tick_features)
print (tick)
volume num_trades first_trade last_trade
INDEX
2018-05-07 21:53:10 0.365127 1.0 0.365127 0.365127
2018-05-07 21:53:15 2.241434 8.0 0.666127 0.006560
2018-05-07 21:53:20 1.332152 2.0 0.666076 0.666076
2018-05-07 21:53:25 0.195509 3.0 0.100000 -0.053518
2018-05-07 21:53:30 0.000000 0.0 NaN NaN
2018-05-07 21:53:35 0.000000 0.0 NaN NaN
2018-05-07 21:53:40 0.000000 0.0 NaN NaN
2018-05-07 21:53:45 0.146302 2.0 -0.046302 0.100000
来源:https://stackoverflow.com/questions/50286010/get-first-and-last-elements-with-pd-grouper