问题
I have a data set with three colums: rating , breed, and dog.
import pandas as pd
dogs = {'breed': ['Chihuahua', 'Chihuahua', 'Dalmatian', 'Sphynx'],
'dog': [True, True, True, False],
'rating': [8.0, 9.0, 10.0, 7.0]}
df = pd.DataFrame(data=dogs)
I would like to calculate the mean rating per breed where dog is True. This would be the expected:
breed rating
0 Chihuahua 8.5
1 Dalmatian 10.0
This has been my attempt:
df.groupby('breed')['rating'].mean().where(dog == True)
And this is the error that I get:
NameError: name 'dog' is not defined
But when I try add the where
condition I only get errors. Can anyone advise a solution? TIA
回答1:
Once you groupby and select a column, your dog
column doesn't exist anymore in the context you have selected (and even if it did you are not accessing it correctly).
Filter your dataframe first, then use groupby
with mean
df[df.dog].groupby('breed')['rating'].mean().reset_index()
breed rating
0 Chihuahua 8.5
1 Dalmatian 10.0
回答2:
An alternative solution is to make dog
one of your grouper keys. Then filter by dog
in a separate step. This is more efficient if you do not want to lose aggregated data for non-dogs.
res = df.groupby(['dog', 'breed'])['rating'].mean().reset_index()
print(res)
dog breed rating
0 False Sphynx 7.0
1 True Chihuahua 8.5
2 True Dalmatian 10.0
print(res[res['dog']])
dog breed rating
1 True Chihuahua 8.5
2 True Dalmatian 10.0
来源:https://stackoverflow.com/questions/50662469/pandas-group-by-a-column-that-meets-a-condition