问题
I want to calculate means by group, leaving out the value of the row itself.
import pandas as pd
d = {'col1': ["a", "a", "b", "a", "b", "a"], 'col2': [0, 4, 3, -5, 3, 4]}
df = pd.DataFrame(data=d)
I know how to return means by group:
df.groupby('col1').agg({'col2': 'mean'})
Which returns:
Out[247]:
col1 col2
1 a 4
3 a -5
5 a 4
But what I want is mean by group, leaving out the row's value. E.g. for the first row:
df.query('col1 == "a"')[1:4].mean()
which returns:
Out[251]:
col2 1.0
dtype: float64
Edit:
Expected output is a dataframe of the same format as df
above, with a column mean_excl_own
which is the mean across all other members in the group, excluding the row's own value.
回答1:
You could GroupBy col1
and transform with the mean. Then subtract the value from a given row from the mean:
df['col2'] = df.groupby('col1').col2.transform('mean').sub(df.col2)
回答2:
Thanks for all your input. I ended up using the approach linked to by @VnC.
Here's how I solved it:
import pandas as pd
d = {'col1': ["a", "a", "b", "a", "b", "a"], 'col2': [0, 4, 3, -5, 3, 4]}
df = pd.DataFrame(data=d)
group_summary = df.groupby('col1', as_index=False)['col2'].agg(['mean', 'count'])
df = pd.merge(df, group_summary, on = 'col1')
df['other_sum'] = df['col2'] * df['mean'] - df['col2']
df['result'] = df['other_sum'] / (df['count'] - 1)
Check out the final result:
df['result']
Which prints:
Out:
0 1.000000
1 -0.333333
2 2.666667
3 -0.333333
4 3.000000
5 3.000000
Name: result, dtype: float64
Edit: I previously had some trouble with column names, but I fixed it using this answer.
来源:https://stackoverflow.com/questions/55709649/pandas-calculate-mean-leaving-out-own-rows-value