Pandas: Calculate mean leaving out own row's value

大憨熊 提交于 2019-12-02 04:18:30

You could GroupBy col1and transform with the mean. Then subtract the value from a given row from the mean:

df['col2'] = df.groupby('col1').col2.transform('mean').sub(df.col2)

Thanks for all your input. I ended up using the approach linked to by @VnC.

Here's how I solved it:

import pandas as pd

d = {'col1': ["a", "a", "b", "a", "b", "a"], 'col2': [0, 4, 3, -5, 3, 4]}
df = pd.DataFrame(data=d)

group_summary = df.groupby('col1', as_index=False)['col2'].agg(['mean', 'count'])
df = pd.merge(df, group_summary, on = 'col1')

df['other_sum'] = df['col2'] * df['mean'] - df['col2'] 
df['result'] = df['other_sum'] / (df['count']  - 1)

Check out the final result:

df['result']

Which prints:

Out: 
0    1.000000
1   -0.333333
2    2.666667
3   -0.333333
4    3.000000
5    3.000000
Name: result, dtype: float64

Edit: I previously had some trouble with column names, but I fixed it using this answer.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!