Pandas group by time with specified start time with non integer minutes

て烟熏妆下的殇ゞ 提交于 2019-12-01 05:48:45

问题


I have a dataframe with one hour long signals. I want to group them in 10 minutes buckets. The problem is that the starting time is not precisely a "multiple" of 10 minutes, therefore, instead of obtaining 6 groups, I obtain 7 with the first and the last incomplete.

The issue can be easily reproduced doing

import pandas as pd
import numpy as np
import datetime as dt

rng = pd.date_range('1/1/2011 00:05:30', periods=3600, freq='1S')
ts = pd.DataFrame({'a':np.random.randn(len(rng)),'b':np.random.randn(len(rng))}, index=rng)

interval = dt.timedelta(minutes=10)

ts.groupby(pd.Grouper(freq=interval)).apply(len)

2011-01-01 00:00:00    270
2011-01-01 00:10:00    600
2011-01-01 00:20:00    600
2011-01-01 00:30:00    600
2011-01-01 00:40:00    600
2011-01-01 00:50:00    600
2011-01-01 01:00:00    330
Freq: 10T, dtype: int64

I tried to solve it as described here but base only takes integer number of minutes. For the above example (starting from 30s after 00:05) the code below still doesn't work

ts.groupby(pd.Grouper(freq=interval, base=ts.index[0].minute)).apply(len)

How can I set a generic starting time for the Grouper? My expected output here would be

2011-01-01 00:05:30    600
2011-01-01 00:15:30    600
2011-01-01 00:25:30    600
2011-01-01 00:35:30    600
2011-01-01 00:45:30    600
2011-01-01 00:55:30    600

回答1:


base accepts a float argument. In addition to the minutes, you must also consider the seconds.

base = ts.index[0].minute + ts.index[0].second/60
ts.groupby(pd.Grouper(freq=interval, base=base)).size()

2011-01-01 00:05:30    600
2011-01-01 00:15:30    600
2011-01-01 00:25:30    600
2011-01-01 00:35:30    600
2011-01-01 00:45:30    600
2011-01-01 00:55:30    600
Freq: 10T, dtype: int64


来源:https://stackoverflow.com/questions/54206570/pandas-group-by-time-with-specified-start-time-with-non-integer-minutes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!