Grouping DataFrame by start of decade using pandas Grouper

若如初见. 提交于 2019-12-01 03:28:15

问题


I have a dataframe of daily observations from 01-01-1973 to 12-31-2014.

Have been using Pandas Grouper and everything has worked fine for each frequency until now: I want to group them by decade 70s, 80s, 90s, etc.

I tried to do it as

import pandas as pd
df.groupby(pd.Grouper(freq = '10Y')).mean()

However, this groups them in 73-83, 83-93, etc.


回答1:


You can do a little arithmetic on the year to floor it to the nearest decade:

df.groupby(df.index.year // 10 * 10).mean()



回答2:


pd.cut also works to specify a regular frequency with a specified start year.

import pandas as pd
df
                 date  val
0 1970-01-01 00:01:18    1
1 1979-12-31 18:01:01   12
2 1980-01-01 00:00:00    2
3 1989-01-01 00:00:00    3
4 2014-05-06 00:00:00    4

df.groupby(pd.cut(df.date, pd.date_range('1970', '2020', freq='10YS'), right=False)).mean()
#                          val
#date                         
#[1970-01-01, 1980-01-01)  6.5
#[1980-01-01, 1990-01-01)  2.5
#[1990-01-01, 2000-01-01)  NaN
#[2000-01-01, 2010-01-01)  NaN
#[2010-01-01, 2020-01-01)  4.0



回答3:


@cᴏʟᴅsᴘᴇᴇᴅ's method is cleaner then this, but keeping your pd.Grouper method, one way to do this is to merge your data with a new date range that starts at the beginning of a decade and ends at the end of a decade, then use your Grouper on that. For example, given an initial df:

        date      data
0     1973-01-01 -1.097895
1     1973-01-02  0.834253
2     1973-01-03  0.134698
3     1973-01-04 -1.211177
4     1973-01-05  0.366136
...
15335 2014-12-27 -0.566134
15336 2014-12-28 -1.100476
15337 2014-12-29  0.115735
15338 2014-12-30  1.635638
15339 2014-12-31  1.930645

Merge that with a date_range dataframe ranging from 1980 to 2020:

new_df = pd.DataFrame({'date':pd.date_range(start='01-01-1970', end='12-31-2019', freq='D')})

df = new_df.merge(df, on ='date', how='left')

And use your Grouper:

df.groupby(pd.Grouper(key='date', freq = '10AS')).mean()

Which gives you:

                data
date                
1970-01-01 -0.005455
1980-01-01  0.028066
1990-01-01  0.011122
2000-01-01  0.011213
2010-01-01  0.029592

The same, but in one go, could look like this:

(df.merge(pd.DataFrame(
    {'date':pd.date_range(start='01-01-1970',
                          end='12-31-2019',
                          freq='D')}),
          how='right')
 .groupby(pd.Grouper(key='date', freq = '10AS'))
 .mean())



回答4:


Something like

df.groupby(df.index.astype(str).str[:2]+'0').mean()


来源:https://stackoverflow.com/questions/50145982/grouping-dataframe-by-start-of-decade-using-pandas-grouper

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!