Finding most common values with Pandas GroupBy and value_counts

隐身守侯 提交于 2019-12-24 04:20:30

问题


I am working with two columns in a table.

+-------------+--------------------------------------------------------------+
|  Area Name  |                       Code Description                       |
+-------------+--------------------------------------------------------------+
| N Hollywood | VIOLATION OF RESTRAINING ORDER                               |
| N Hollywood | CRIMINAL THREATS - NO WEAPON DISPLAYED                       |
| N Hollywood | CRIMINAL THREATS - NO WEAPON DISPLAYED                       |
| N Hollywood | ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT               |
| Southeast   | ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT               |
| West Valley | CRIMINAL THREATS - NO WEAPON DISPLAYED                       |
| West Valley | CRIMINAL THREATS - NO WEAPON DISPLAYED                       |
| 77th Street | RAPE, FORCIBLE                                               |
| Foothill    | CRM AGNST CHLD (13 OR UNDER) (14-15 & SUSP 10 YRS OLDER)0060 |
| N Hollywood | VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS) 0114 |
+-------------+--------------------------------------------------------------+

I'm using the Groupby and value_counts to find Code Descriptions by Area Name.

df.groupby(['Area Name'])['Code Description'].value_counts()

Is there a way to view only the top 'n' values per Area Name? If I append .nlargest(3) to the code above it only returns result for one Area Name.

+---------------------------------------------------------------------------------+
| Wilshire     SHOPLIFTING-GRAND THEFT ($950.01 & OVER)                         7 |
+---------------------------------------------------------------------------------+

回答1:


Use head in each group from the results of value_counts:

df.groupby('Area Name')['Code Description'].apply(lambda x: x.value_counts().head(3))

Output:

Area Name                                                                
77th Street  RAPE, FORCIBLE                                                  1
Foothill     CRM AGNST CHLD (13 OR UNDER) (14-15 & SUSP 10 YRS OLDER)0060    1
N Hollywood  CRIMINAL THREATS - NO WEAPON DISPLAYED                          2
             VIOLATION OF RESTRAINING ORDER                                  1
             ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT                  1
Southeast    ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT                  1
West Valley  CRIMINAL THREATS - NO WEAPON DISPLAYED                          2
Name: Code Description, dtype: int64



回答2:


You can perform a double groupby:

s = df.groupby('Area Name')['Code Description'].value_counts()
res = s.groupby('Area Name').nlargest(3).reset_index(level=1, drop=True)

print(res)

Area Name    Code Description                                            
77th Street  RAPE, FORCIBLE                                                  1
Foothill     CRM AGNST CHLD (13 OR UNDER) (14-15 & SUSP 10 YRS OLDER)0060    1
N Hollywood  CRIMINAL THREATS - NO WEAPON DISPLAYED                          2
             ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT                  1
             VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS) 0114    1
Southeast    ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT                  1
West Valley  CRIMINAL THREATS - NO WEAPON DISPLAYED                          2
Name: Code Description, dtype: int64


来源:https://stackoverflow.com/questions/50592762/finding-most-common-values-with-pandas-groupby-and-value-counts

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!