Pandas groupby value_count filter by frequency

断了今生、忘了曾经 提交于 2020-01-04 05:49:14

问题


I would like to filter out the frequencies that are less than n, in my case n is 2

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'bar',],'B' : ['yes', 'no', 'yes', 'no', 'no', 'yes','yes', 'no', 'no', 'no']})
df.groupby('A')['B'].value_counts()

A    B  
bar  no     4
     yes    1
foo  yes    3
     no     2
Name: B, dtype: int64

Ideally I would like the results in a dataframe showing the below(frequency of 1 is not excluded)

A    B      freq
bar  no     4
foo  yes    3
foo  no     2

I have tried

df.groupby('A')['B'].filter(lambda x: len(x) > 1)

but this fails as apparently groupby returns a serie


回答1:


This can be down with one line with .loc

df.groupby('A')['B'].value_counts().loc[lambda x : x>1].reset_index(name='count')
Out[530]: 
     A    B  count
0  bar   no      4
1  foo  yes      3
2  foo   no      2



回答2:


You can just store the value_counts output and then just filter it:

In[3]:
counts = df.groupby('A')['B'].value_counts()
counts[counts>=2]

Out[3]: 
A    B  
bar  no     4
foo  yes    3
     no     2
Name: B, dtype: int64

If you want to get your desired output, you can call reset_index and rename the new column:

In[21]:
counts[counts>=2].reset_index(name='count')

Out[21]: 
     A    B  count
0  bar   no      4
1  foo  yes      3
2  foo   no      2


来源:https://stackoverflow.com/questions/50117068/pandas-groupby-value-count-filter-by-frequency

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!