How to sort Pandas DataFrame both by MultiIndex and by value?

北慕城南 提交于 2019-12-10 14:53:29

问题


Sample data:

mdf = pd.DataFrame([[1,2,50],[1,2,20],
                [1,5,10],[2,8,80],
                [2,5,65],[2,8,10]
               ], columns=['src','dst','n']); mdf

    src dst n
0   1   2   50
1   1   2   20
2   1   5   10
3   2   8   80
4   2   5   65
5   2   8   10

groupby() gives a two-level multi-index:

test = mdf.groupby(['src','dst'])['n'].agg(['sum','count']); test

        sum count
src dst 
1   2   70  2
    5   10  1
2   5   65  1
    8   90  2

Question: how to sort this DataFrame by src ascending and then by sum descending?

I'm a beginner with pandas, learned about sort_index() and sort_values(), but in this task it seems that I need both simultaneously.

Expected result, under each "src" sorting is determined by the "sum":

        sum count
src dst 
1   2   70  2
    5   10  1
2   8   90  2
    5   65  1

回答1:


IIUC:

In [29]: test.sort_values('sum', ascending=False).sort_index(level=0)
Out[29]:
         sum  count
src dst
1   2     80      2
    5     10      1
2   8     80      1

UPDATE: very similar to @anonyXmous's solution:

In [47]: (test.reset_index()
              .sort_values(['src','sum'], ascending=[1,0])
              .set_index(['src','dst']))
Out[47]:
         sum  count
src dst
1   2     70      2
    5     10      1
2   8     90      2
    5     65      1



回答2:


You can reset the index then sort them by chosen columns. Hope this helps.

import pandas as pd

mdf = pd.DataFrame([[1,2,50],[1,2,20],
                [1,5,10],[2,8,80],
                [2,5,65],[2,8,10]
               ], columns=['src','dst','n']); 
mdf = mdf.groupby(['src','dst'])['n'].agg(['sum','count']); 
mdf.reset_index(inplace=True)
mdf.sort_values(['src', 'sum'], ascending=[True, False], inplace=True)
print(mdf)

Result:
       src dst sum  count
    0   1   2   70   2
    1   1   5   10   1
    3   2   8   90   2
    2   2   5   65   1



回答3:


In case anyone else comes across this using google as well. Since pandas version 0.23, you can pass the name of the level as an argument to sort_values:

test.sort_values(['src','sum'], ascending=[1,0])

Result:
         sum  count
src dst            
1   2     70      2
    5     10      1
2   8     90      2
    5     65      1


来源:https://stackoverflow.com/questions/49264510/how-to-sort-pandas-dataframe-both-by-multiindex-and-by-value

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!