问题
I am having some trouble with pandas rolling. Here a simplify version of my dataset:
df2 = pd.DataFrame({
'A' : pd.Categorical(["test","train","test","train",'train','hello']),
'B' : (pd.Timestamp('2013-01-02 00:00:05'),
pd.Timestamp('2013-01-02 00:00:10'),
pd.Timestamp('2013-01-02 00:00:09'),
pd.Timestamp('2013-01-02 00:01:05'),
pd.Timestamp('2013-01-02 00:01:25'),
pd.Timestamp('2013-01-02 00:02:05')),
'C' : 1.}).sort_values('A').reset_index(drop=True)
>>> df2
A B C
0 hello 2013-01-02 00:02:05 1.0
1 test 2013-01-02 00:00:05 1.0
2 test 2013-01-02 00:00:09 1.0
3 train 2013-01-02 00:00:10 1.0
4 train 2013-01-02 00:01:05 1.0
5 train 2013-01-02 00:01:25 1.0
I would like to have a rolling window of 10s, to get the following output:
A count
0 hello 1
1 test 2
3 train 1
I try the groupby and rolling.
df2.groupby('A').rolling('10s', on='B', closed='right').C.sum()
I get the rolling windows from the past '10s' observation, which is not what i am looking for:
A B
hello 2013-01-02 00:02:05 1.0
test 2013-01-02 00:00:05 1.0
2013-01-02 00:00:09 2.0
train 2013-01-02 00:00:10 1.0
2013-01-02 00:01:05 1.0
2013-01-02 00:01:25 1.0
I also try resampling, but I am not able to get the result.
grouped = df3.set_index('B').groupby('A').resample('S' )['C'].count()
grouped.reset_index().groupby('A').rolling(window=10,on='B' , min_periods=1).sum()
回答1:
I think you have to try this:
df2.groupby('A').rolling('11s', on='B').agg({'C': 'sum'}).groupby('A').max()
回答2:
This might do the trick on this example, but i'm not sure it's a good enough solution :
df2.groupby('A').rolling('10s', on='B').agg({'C': 'sum'}).groupby("A").max()
来源:https://stackoverflow.com/questions/46458940/pandas-groupby-rolling-uneven-time