Rank by grouby column aggregate

被刻印的时光 ゝ 提交于 2021-02-07 17:33:46

问题


I want to create a column manager_rank that ranks a manager by the sum of returns. I have come up with one solution posted below but was hoping if someone else had something more elegant.

import pandas as pd
df = pd.DataFrame([['2012', 'A', 1], ['2012', 'B', 4], ['2011', 'A', 5], ['2011', 'B', 4]],
                 columns=['year', 'manager', 'return'])

Desired result:

   year manager  return  manager_rank
0  2012       A       1             2
1  2011       A       5             2
2  2012       B       4             1
3  2011       B       4             1

回答1:


df['ranking'] = df.groupby('manager')['return'].transform(np.sum).rank(ascending=False, method='dense')

   year manager  return  ranking
0  2012       A       1        2
1  2012       B       4        1
2  2011       A       5        2
3  2011       B       4        1



回答2:


You can remove to_frame and add name to reset_index:

manager_rank = (df.groupby('manager')
                  .sum()
                  ['return']
                  .rank(ascending=False)
                  .reset_index(name='manager_rank')
                )

df = pd.merge(df, manager_rank, on='manager')
print df

   year manager  return  manager_rank
0  2012       A       1             2
1  2011       A       5             2
2  2012       B       4             1
3  2011       B       4             1



回答3:


How about extending the method proposed by @Stefan to include the final cumulative return of each manager (returns don't sum, they compound).

df['total_return'] = (df
                      .groupby('manager')['return']
                      .transform(lambda group: (1 + group / 100.).cumprod().iat[-1])) - 1
df['ranking'] = df.total_return.rank(ascending=False, method='dense')

>>> df
   year manager  return  ranking  total_return
0  2012       A       1        2        0.0605
1  2012       B       4        1        0.0816
2  2011       A       5        2        0.0605
3  2011       B       4        1        0.0816



回答4:


One-liner:

manager_rank = (df.groupby('manager')
                  .sum()
                  ['return']
                  .rank(ascending=False)
                  .to_frame(name='manager_rank')
                  .reset_index()
                )

df = pd.merge(df, manager_rank, on='manager')

Step By Step Details:

1. Group by Manager with sum as aggregation function

In [8]: df.groupby('manager').sum()
Out[8]: 
         return
manager        
A             6
B             8

2. Use rank() assign ranks to managers

In [9]: df.groupby('manager').sum().rank()
Out[9]: 
         return
manager        
A             1
B             2

In [10]: df.groupby('manager').sum().rank(ascending=False)
Out[10]: 
         return
manager        
A             2
B             1

3. Cast this result to another column

In [13]: df.groupby('manager').sum().rank(ascending=False)['return'].to_frame(name='manager_rank')
Out[13]: 
         manager_rank
manager              
A                   2
B                   1

4. Join the result of above steps with original data frame!

df = pd.merge(df, manager_rank, on='manager')


来源:https://stackoverflow.com/questions/34498930/rank-by-grouby-column-aggregate

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!