Groupby with TimeGrouper 'backwards'

前端 未结 2 1629
遥遥无期
遥遥无期 2021-01-01 02:23

I have a DataFrame containing a time series:

rng = pd.date_range(\'2016-06-01\', periods=24*7, freq=\'H\')
ones = pd.Series([1]*24*7, rng)
rdf =         


        
相关标签:
2条回答
  • 2021-01-01 03:08

    Since I primarily want to group by 7 days, aka one week, I am using this method now to come to the desired bins:

    from pandas.tseries.offsets import Week
    
    # Let's not make full weeks
    hours = 24*6*4
    rng = pd.date_range('2016-06-01', periods=hours, freq='H')
    
    # Set week start to whatever the last weekday of the range is
    print("Last day is %s" % rng[-1])
    freq = Week(weekday=rng[-1].weekday())
    
    ones = pd.Series([1]*hours, rng)
    rdf = pd.DataFrame({'a': ones})
    rdf.groupby(pd.TimeGrouper(freq=freq, closed='right', label='right')).sum()
    

    This gives me the desired output of

    2016-06-25  96
    2016-07-02  168
    2016-07-09  168
    
    0 讨论(0)
  • 2021-01-01 03:18

    Since the question now focuses on grouping by week, you can simply:

    rdf.resample('W-{}'.format(rdf.index[-1].strftime('%a')), closed='right', label='right').sum()
    

    You can use loffset to get it to work - at least for most periods (using .resample()):

    for i in range(2, 7):
        print(i)
        print(rdf.resample('{}D'.format(i), closed='right', loffset='{}D'.format(i)).sum())
    
    2
                 a
    2016-06-01  24
    2016-06-03  48
    2016-06-05  48
    2016-06-07  48
    3
                 a
    2016-06-01  24
    2016-06-04  72
    2016-06-07  72
    4
                 a
    2016-06-01  24
    2016-06-05  96
    2016-06-09  48
    5
                  a
    2016-06-01   24
    2016-06-06  120
    2016-06-11   24
    6
                  a
    2016-06-01   24
    2016-06-07  144
    

    However, you could also create custom groupings that calculate the correct values without TimeGrouper like so:

    days = rdf.index.to_series().dt.day.unique()[::-1]
    for n in range(2, 7):
        chunks = [days[i:i + n] for i in range(0, len(days), n)][::-1]
        grp = pd.Series({k: v for d in [zip(chunk, [idx] * len(chunk)) for idx, chunk in enumerate(chunks)] for k, v in d})
        rdf.groupby(rdf.index.to_series().dt.day.map(grp))['a'].sum()
    
     2
    groups
    0    24
    1    48
    2    48
    3    48
    Name: a, dtype: int64
    
     3
    groups
    0    24
    1    72
    2    72
    Name: a, dtype: int64
    
     4
    groups
    0    72
    1    96
    Name: a, dtype: int64
    
     5
    groups
    0     48
    1    120
    Name: a, dtype: int64
    
     6
    groups
    0     24
    1    144
    Name: a, dtype: int64
    
    0 讨论(0)
提交回复
热议问题