Using Pandas GroupBy and size()/count() to generate an aggregated DataFrame

末鹿安然 提交于 2019-12-12 12:19:45

问题


So I currently have a DataFrame called df that goes:

date                       tag
2011-02-18 12:57:00-07:00  A
2011-02-19 12:57:00-07:00  A
2011-03-18 12:57:00-07:00  B
2011-04-01 12:57:00-07:00  C
2011-05-19 12:57:00-07:00  Z
2011-06-03 12:57:00-07:00  A
2011-06-05 12:57:00-07:00  A
...

I'm trying to do a GroupBy the tag, and the date (yr/month), so it looks like:

date     A  B  C  Z
2011-02  2  0  0  0
2011-03  0  1  0  0
2011-04  0  0  1  0
2011-05  0  0  0  1
2011-06  2  0  0  0
...

I've tried the following, but it doesn't quite give me what I want.

grouped_series = df.groupby([["%s-%s" % (d.year, d.month) for d in df.date], df.tag]).size()

I know which tag exists etc. Any help will be greatly appreciated.

UPDATE (for people looking in the future):

Ended up keeping the datetime, instead of string format. Trust me, this will be better when plotting:

grouped_df = df.groupby([[ datetime.datetime(d.year, d.month, 1, 0, 0) for d in df.date], df.name]).size()
grouped_df = grouped_df.unstack().fillna(0)

回答1:


you could use unstack() and fillna() methods:

>>> g = df.groupby([["%s-%s" % (d.year, d.month) for d in df.date], df.tag]).size()
>>> g
        tag
2011-2  A      2
2011-3  B      1
2011-4  C      1
2011-5  Z      1
2011-6  A      2
dtype: int64
>>> g.unstack().fillna(0)
tag     A  B  C  Z
2011-2  2  0  0  0
2011-3  0  1  0  0
2011-4  0  0  1  0
2011-5  0  0  0  1
2011-6  2  0  0  0


来源:https://stackoverflow.com/questions/27242442/using-pandas-groupby-and-size-count-to-generate-an-aggregated-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!