How to group a pandas dataframe by a defined time interval?

前端 未结 2 2041
梦如初夏
梦如初夏 2020-11-29 06:50

I have a dataFrame like this, I would like to group every 60 minutes and start grouping at 06:30.

                           data
index
2017-02-14 06:29:57           


        
相关标签:
2条回答
  • 2020-11-29 07:11

    Using DataFrame.resample which is a dedicated method for resampling time series, this way we dont need DataFrame.GroupBy and pd.Grouper:

    df.resample('60min', base=30, label='right').first()
    

    Output

                               data
    index                          
    2017-02-14 06:30:00  11198648.0
    2017-02-14 07:30:00  11198650.0
    2017-02-14 08:30:00         NaN
    2017-02-14 09:30:00         NaN
    2017-02-14 10:30:00         NaN
    2017-02-14 11:30:00         NaN
    2017-02-14 12:30:00         NaN
    2017-02-14 13:30:00         NaN
    2017-02-14 14:30:00         NaN
    2017-02-14 15:30:00         NaN
    2017-02-14 16:30:00         NaN
    2017-02-14 17:30:00         NaN
    2017-02-14 18:30:00         NaN
    2017-02-14 19:30:00         NaN
    2017-02-14 20:30:00         NaN
    2017-02-14 21:30:00         NaN
    2017-02-14 22:30:00         NaN
    2017-02-14 23:30:00  11207728.0
    

    Notice: when you have multiple columns in your dataframe, you have to specify the column you want to aggregate on:

    df.resample('60min', base=30, label='right')['data'].first()
    
    0 讨论(0)
  • 2020-11-29 07:19

    Use base=30 in conjunction with label='right' parameters in pd.Grouper.

    Specifying label='right' makes the time-period to start grouping from 6:30 (higher side) and not 5:30. Also, base is set to 0 by default, hence the need to offset those by 30 to account for the forward propagation of dates.

    Suppose, you want to aggregate the first element of every sub-group, then:

    df.groupby(pd.Grouper(freq='60Min', base=30, label='right')).first()
    # same thing using resample - df.resample('60Min', base=30, label='right').first()
    

    yields:

                               data
    index                          
    2017-02-14 06:30:00  11198648.0
    2017-02-14 07:30:00  11198650.0
    2017-02-14 08:30:00         NaN
    2017-02-14 09:30:00         NaN
    2017-02-14 10:30:00         NaN
    2017-02-14 11:30:00         NaN
    2017-02-14 12:30:00         NaN
    2017-02-14 13:30:00         NaN
    2017-02-14 14:30:00         NaN
    2017-02-14 15:30:00         NaN
    2017-02-14 16:30:00         NaN
    2017-02-14 17:30:00         NaN
    2017-02-14 18:30:00         NaN
    2017-02-14 19:30:00         NaN
    2017-02-14 20:30:00         NaN
    2017-02-14 21:30:00         NaN
    2017-02-14 22:30:00         NaN
    2017-02-14 23:30:00  11207728.0
    
    0 讨论(0)
提交回复
热议问题