问题
I would like to filter out the frequencies that are less than n, in my case n is 2
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'bar',],'B' : ['yes', 'no', 'yes', 'no', 'no', 'yes','yes', 'no', 'no', 'no']})
df.groupby('A')['B'].value_counts()
A B
bar no 4
yes 1
foo yes 3
no 2
Name: B, dtype: int64
Ideally I would like the results in a dataframe showing the below(frequency of 1 is not excluded)
A B freq
bar no 4
foo yes 3
foo no 2
I have tried
df.groupby('A')['B'].filter(lambda x: len(x) > 1)
but this fails as apparently groupby returns a serie
回答1:
This can be down with one line with .loc
df.groupby('A')['B'].value_counts().loc[lambda x : x>1].reset_index(name='count')
Out[530]:
A B count
0 bar no 4
1 foo yes 3
2 foo no 2
回答2:
You can just store the value_counts
output and then just filter it:
In[3]:
counts = df.groupby('A')['B'].value_counts()
counts[counts>=2]
Out[3]:
A B
bar no 4
foo yes 3
no 2
Name: B, dtype: int64
If you want to get your desired output, you can call reset_index
and rename the new column:
In[21]:
counts[counts>=2].reset_index(name='count')
Out[21]:
A B count
0 bar no 4
1 foo yes 3
2 foo no 2
来源:https://stackoverflow.com/questions/50117068/pandas-groupby-value-count-filter-by-frequency