Pandas groupby month and year

后端 未结 4 1416
闹比i
闹比i 2020-11-27 11:18

I have the following dataframe:

Date        abc    xyz
01-Jun-13   100    200
03-Jun-13   -20    50
15-Aug-13   40     -5
20-Jan-14   25     15
21-Feb-14   6         


        
4条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2020-11-27 12:01

    You can use either resample or Grouper (which resamples under the hood).

    First make sure that the datetime column is actually of datetimes (hit it with pd.to_datetime). It's easier if it's a DatetimeIndex:

    In [11]: df1
    Out[11]:
                abc  xyz
    Date
    2013-06-01  100  200
    2013-06-03  -20   50
    2013-08-15   40   -5
    2014-01-20   25   15
    2014-02-21   60   80
    
    In [12]: g = df1.groupby(pd.Grouper(freq="M"))  # DataFrameGroupBy (grouped by Month)
    
    In [13]: g.sum()
    Out[13]:
                abc  xyz
    Date
    2013-06-30   80  250
    2013-07-31  NaN  NaN
    2013-08-31   40   -5
    2013-09-30  NaN  NaN
    2013-10-31  NaN  NaN
    2013-11-30  NaN  NaN
    2013-12-31  NaN  NaN
    2014-01-31   25   15
    2014-02-28   60   80
    
    In [14]: df1.resample("M", how='sum')  # the same
    Out[14]:
                abc  xyz
    Date
    2013-06-30   40  125
    2013-07-31  NaN  NaN
    2013-08-31   40   -5
    2013-09-30  NaN  NaN
    2013-10-31  NaN  NaN
    2013-11-30  NaN  NaN
    2013-12-31  NaN  NaN
    2014-01-31   25   15
    2014-02-28   60   80
    

    Note: Previously pd.Grouper(freq="M") was written as pd.TimeGrouper("M"). The latter is now deprecated since 0.21.


    I had thought the following would work, but it doesn't (due to as_index not being respected? I'm not sure.). I'm including this for interest's sake.

    If it's a column (it has to be a datetime64 column! as I say, hit it with to_datetime), you can use the PeriodIndex:

    In [21]: df
    Out[21]:
            Date  abc  xyz
    0 2013-06-01  100  200
    1 2013-06-03  -20   50
    2 2013-08-15   40   -5
    3 2014-01-20   25   15
    4 2014-02-21   60   80
    
    In [22]: pd.DatetimeIndex(df.Date).to_period("M")  # old way
    Out[22]:
    
    [2013-06, ..., 2014-02]
    Length: 5, Freq: M
    
    In [23]: per = df.Date.dt.to_period("M")  # new way to get the same
    
    In [24]: g = df.groupby(per)
    
    In [25]: g.sum()  # dang not quite what we want (doesn't fill in the gaps)
    Out[25]:
             abc  xyz
    2013-06   80  250
    2013-08   40   -5
    2014-01   25   15
    2014-02   60   80
    

    To get the desired result we have to reindex...

提交回复
热议问题