Pandas rank by multiple columns

后端 未结 5 683
小鲜肉
小鲜肉 2020-12-16 19:05

I am trying to rank a pandas data frame based on two columns. I can rank it based on one column, but how can to rank it based on two columns? \'SaleCount\', then \'TotalReve

5条回答
  •  -上瘾入骨i
    2020-12-16 19:29

    pd.factorize will generate unique values for each unique element of a iterable. We only need to sort in the order we'd like, then factorize. In order to do multiple columns, we convert the sorted result to tuples.

    cols = ['SaleCount', 'TotalRevenue']
    tups = df[cols].sort_values(cols, ascending=False).apply(tuple, 1)
    f, i = pd.factorize(tups)
    factorized = pd.Series(f + 1, tups.index)
    
    df.assign(Rank=factorized)
    
             Date  SaleCount  TotalRevenue shops  Rank
    1  2016-12-02        100          9000    S2     1
    5  2016-12-02        100          2000    S8     2
    3  2016-12-02         35           750    S5     3
    2  2016-12-02         30          1000    S1     4
    7  2016-12-02         30           600    S7     5
    4  2016-12-02         20           500    S4     6
    9  2016-12-02         20           500   S10     6
    0  2016-12-02         10           300    S3     7
    8  2016-12-02          2            50    S9     8
    6  2016-12-02          0             0    S6     9
    

提交回复
热议问题