Pandas: Group by a column that meets a condition

纵饮孤独 提交于 2020-07-08 11:22:07

问题


I have a data set with three colums: rating , breed, and dog.

import pandas as pd
dogs = {'breed': ['Chihuahua', 'Chihuahua', 'Dalmatian', 'Sphynx'],
        'dog': [True, True, True, False],
        'rating': [8.0, 9.0, 10.0, 7.0]}

df = pd.DataFrame(data=dogs)

I would like to calculate the mean rating per breed where dog is True. This would be the expected:

  breed     rating
0 Chihuahua 8.5   
1 Dalmatian 10.0  

This has been my attempt:

df.groupby('breed')['rating'].mean().where(dog == True)

And this is the error that I get:

NameError: name 'dog' is not defined

But when I try add the where condition I only get errors. Can anyone advise a solution? TIA


回答1:


Once you groupby and select a column, your dog column doesn't exist anymore in the context you have selected (and even if it did you are not accessing it correctly).

Filter your dataframe first, then use groupby with mean

df[df.dog].groupby('breed')['rating'].mean().reset_index()

       breed  rating
0  Chihuahua     8.5
1  Dalmatian    10.0



回答2:


An alternative solution is to make dog one of your grouper keys. Then filter by dog in a separate step. This is more efficient if you do not want to lose aggregated data for non-dogs.

res = df.groupby(['dog', 'breed'])['rating'].mean().reset_index()

print(res)

     dog      breed  rating
0  False     Sphynx     7.0
1   True  Chihuahua     8.5
2   True  Dalmatian    10.0

print(res[res['dog']])

    dog      breed  rating
1  True  Chihuahua     8.5
2  True  Dalmatian    10.0


来源:https://stackoverflow.com/questions/50662469/pandas-group-by-a-column-that-meets-a-condition

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!