Count values in dataframe based on entry

两盒软妹~` 提交于 2020-01-24 19:30:09

问题


I have a dataframe of the form:

category | value |
cat a    |x      |
cat a    |x      |
cat a    |y      |
cat b    |w      |
cat b    |z      |

I'd like to be able to return something like (showing unique values and frequency)

category | freq of most common value |most common value |
cat a       2                              x
cat b       1                              w #(it doesnt matter if here is an w or z)

回答1:


Use Series.value_counts with Series.head per groups in lambda function:

df = (df.groupby('category', sort=False)['value']
        .apply(lambda x: x.value_counts().head(1))
        .reset_index()
        .rename(columns={'level_1':'most_common_value','value':'freq of most common value'}))
print (df)
  category most_common_value  freq of most common value
0    cat a                 x                          2
1    cat b                 w                          1



回答2:


One approach is to groupby both columns and take the size, sort the values and take the one with higher frequency:

(df.groupby(['category', 'value'])
   .value.size()
   .sort_values()
   .groupby(level=0)
   .tail(1))

category  value
cat b      z        1
cat a      x        2
Name: value, dtype: int64



回答3:


Here is a solution using crosstab:

m = pd.crosstab(df['category'],df['value'])
m = m.max(1).to_frame('freq of most common value').assign(most_common_value=m.idxmax(1))

print(m)

          freq of most common value most_common_value
category                                             
cat a                             2                 x
cat b                             1                 w


来源:https://stackoverflow.com/questions/59646659/count-values-in-dataframe-based-on-entry

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!