Pandas Groupby How to Show Zero Counts in DataFrame

此生再无相见时 提交于 2019-12-18 07:25:29

问题


I have the following Pandas dataframe:

Name   | EventSignupNo | Attended | Points
Smith  | 0145          | Y        | 20.24
Smith  | 0174          | Y        | 29.14
Smith  | 0239          | N        | 0
Adams  | 0145          | N        | 0
Adams  | 0174          | Y        | 33.43
Morgan | 0239          | Y        | 31.23
Morgan | 0244          | Y        | 23.15

and what I'd like is a count of the number of events attended and not attended per person, and the sum of their points, per person. So I do a groupby: df.groupby([Name, Attended]).agg({"Attended": "count", "Points": "sum"}).rename(columns = {"Attended: "Count"}).reset_index()

which would give me something like:

Name   | Attended | Count | Points
Smith  | Y        | 2     | 49.38
Smith  | N        | 1     | 0
Adams  | Y        | 1     | 33.43
Adams  | N        | 1     | 0
Morgan | Y        | 2     | 54.38

but I'd want something like:

Name   | Attended | Count | Points
Smith  | Y        | 2     | 49.38
Smith  | N        | 1     | 0
Adams  | Y        | 1     | 33.43
Adams  | N        | 1     | 0
Morgan | Y        | 2     | 54.38
Morgan | N        | 0     | 0

I tried playing around with pd.MultiIndex to try to fill the missing zero count, but to no avail. I've read the other similar questions but I'm having trouble dealing with the continuous Points column using MultiIndex. Any idea how to do this?


回答1:


You could do this with groupby + agg. For your exact output with Y and N at each level, you'd need reindex:

g = df.groupby(['Name', 'Attended'], sort=False).Points.agg(['count', 'sum'])

g
                 count    sum
Name   Attended              
Smith  Y             2  49.38
       N             1   0.00
Adams  N             1   0.00
       Y             1  33.43
Morgan Y             2  54.38
idx = pd.MultiIndex.from_product([g.index.levels[0], ['Y', 'N']])

idx
MultiIndex(levels=[['Adams', 'Morgan', 'Smith'], ['N', 'Y']],
           labels=[[2, 2, 0, 0, 1, 1], [1, 0, 1, 0, 1, 0]])


g.reindex(idx, fill_value=0)

          count    sum
Smith  Y      2  49.38
       N      1   0.00
Adams  Y      1  33.43
       N      1   0.00
Morgan Y      2  54.38
       N      0   0.00


来源:https://stackoverflow.com/questions/47372181/pandas-groupby-how-to-show-zero-counts-in-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!