pandas group by year, rank by sales column, in a dataframe with duplicate data

后端 未结 1 1274
抹茶落季
抹茶落季 2020-12-05 01:07

I would like to create a rank on year (so in year 2012, Manager B is 1. In 2011, Manager B is 1 again). I struggled with the pandas rank function for awhile and DO NOT want

相关标签:
1条回答
  • 2020-12-05 01:27

    It sounds like you want to group by the Year, then rank the Returns in descending order.

    import pandas as pd
    s = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]],
                     columns=['Year', 'Manager', 'Return'])
    s['Rank'] = s.groupby(['Year'])['Return'].rank(ascending=False)
    print(s)
    

    yields

       Year Manager  Return  Rank
    0  2012       A       3     2
    1  2012       B       8     1
    2  2011       A      20     2
    3  2011       B      30     1
    

    To address the OP's revised question: The error message

    ValueError: cannot reindex from a duplicate axis
    

    occurs when trying to groupby/rank on a DataFrame with duplicate values in the index. You can avoid the problem by constructing s to have unique index values after appending:

    s = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]], columns=['Year', 'Manager', 'Return'])
    b = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]], columns=['Year', 'Manager', 'Return'])
    s = s.append(b, ignore_index=True)
    

    yields

       Year Manager  Return
    0  2012       A       3
    1  2012       B       8
    2  2011       A      20
    3  2011       B      30
    4  2012       A       3
    5  2012       B       8
    6  2011       A      20
    7  2011       B      30
    

    If you've already appended new rows using

    s = s.append(b)
    

    then use reset_index to create a unique index:

    s = s.reset_index(drop=True)
    
    0 讨论(0)
提交回复
热议问题