How to group and count rows by month and year using Pandas?

后端 未结 5 643
情歌与酒
情歌与酒 2020-12-13 20:33

I have a dataset with personal data such as name, height, weight and date of birth. I would build a graph with the number of people born in a particular month and year. I\'m

相关标签:
5条回答
  • Another solution is to set birthdate as the index and resample:

    import pandas as pd
    
    df = pd.DataFrame({'birthdate': pd.date_range(start='20-12-2015', end='3-1-2016')})
    df.set_index('birthdate').resample('MS').size()
    

    Output:

    birthdate
    2015-12-01    12
    2016-01-01    31
    2016-02-01    29
    2016-03-01     1
    Freq: MS, dtype: int64
    
    0 讨论(0)
  • 2020-12-13 21:00

    Replace date and count fields with your respective column names. This piece of code will group, sum and sort based on the given parameters. You can also change the frequency to 1M or 2M and so on...

    df[['date', 'count']].groupby(pd.Grouper(key='date', freq='1M')).sum().sort_values(by='date', ascending=True)['count']
    
    0 讨论(0)
  • 2020-12-13 21:02

    You can also use the "monthly" period with to_period with the dt accessor:

    In [11]: df = pd.DataFrame({'birthdate': pd.date_range(start='20-12-2015', end='3-1-2016')})
    
    In [12]: df['birthdate'].groupby(df.birthdate.dt.to_period("M")).agg('count')
    Out[12]:
    birthdate
    2015-12    12
    2016-01    31
    2016-02    29
    2016-03     1
    Freq: M, Name: birthdate, dtype: int64
    

    It's worth noting if the datetime is the index (rather than a column) you can use resample:

    df.resample("M").count()
    
    0 讨论(0)
  • 2020-12-13 21:04

    As of April 2019: This will work. Pandas version - 0.24.x

    df.groupby([df.dates.dt.year.rename('year'), df.dates.dt.month.rename('month')]).size()

    0 讨论(0)
  • 2020-12-13 21:09

    To group on multiple criteria, pass a list of the columns or criteria:

    df['birthdate'].groupby([df.birthdate.dt.year, df.birthdate.dt.month]).agg('count')
    

    Example:

    In [165]:
    df = pd.DataFrame({'birthdate':pd.date_range(start=dt.datetime(2015,12,20),end=dt.datetime(2016,3,1))})
    df.groupby([df['birthdate'].dt.year, df['birthdate'].dt.month]).agg({'count'})
    
    Out[165]:
                        birthdate
                            count
    birthdate birthdate          
    2015      12               12
    2016      1                31
              2                29
              3                 1
    

    UPDATE

    As of version 0.23.0 the above code no longer works due to the restriction that multi-index level names must be unique, you now need to rename the levels in order for this to work:

    In[107]:
    df.groupby([df['birthdate'].dt.year.rename('year'), df['birthdate'].dt.month.rename('month')]).agg({'count'})
    
    Out[107]: 
               birthdate
                   count
    year month          
    2015 12           12
    2016 1            31
         2            29
         3             1
    
    0 讨论(0)
提交回复
热议问题