发表新帖

发表新帖

Normalize DataFrame by group

后端未结

关注

 4  689

梦谈多话 2021-02-07 02:04

Let\'s say that I have some data generated as follows:

N = 20
m = 3
data = np.random.normal(size=(N,m)) + np.random.normal(size=(N,m))**3

and t

4条回答

南旧 (楼主)

2021-02-07 02:35
The accepted answer works and is elegant. Unfortunately, for large datasets I think performance-wise using .transform() is much much slower than doing the less elegant following (illustrated with a single column 'a0'):
```
means_stds = df.groupby('indx')['a0'].agg(['mean','std']).reset_index()
df = df.merge(means_stds,on='indx')
df['a0_normalized'] = (df['a0'] - df['mean']) / df['std']
```
To do it for multiple columns you'll have to figure out the merge. My suggestion would be to flatten the multiindex columns from aggregation as in this answer and then merge and normalize for each column separately:
```
means_stds = df.groupby('indx')[['a0','a1']].agg(['mean','std']).reset_index()
means_stds.columns = ['%s%s' % (a, '|%s' % b if b else '') for a, b in means_stds.columns]
df = df.merge(means_stds,on='indx')
for col in ['a0','a1']:
    df[col+'_normalized'] = ( df[col] - df[col+'|mean'] ) / df[col+'|std']
```
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题