Python pandas dataframe group by based on a condition

放肆的年华 提交于 2020-03-17 07:59:09

问题


My question is simple, I have a dataframe and I groupby the results based on a column and get the size like this:

df.groupby('column').size()

Now the problem is that I only want the ones where size is greater than X. I am wondering if I can do it using a lambda function or anything similar? I have already tried this:

df.groupby('column').size() > X

and it prints out some True and False values.


回答1:


The grouped result is a regular DataFrame, so just filter the results as usual:

 import pandas as pd

 df = pd.DataFrame({'a': ['a', 'b', 'a', 'a', 'b', 'c', 'd']})
 after = df.groupby('a').size()
 >> after
 a
 a    3
 b    2
 c    1
 d    1
 dtype: int64

 >> after[after > 2]
 a
 a    3
 dtype: int64



回答2:


Try this code:

df.groupby('column').filter(lambda group: group.size > X)


来源:https://stackoverflow.com/questions/31303417/python-pandas-dataframe-group-by-based-on-a-condition

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!