Python Pandas sorting multiindex by column, but retain tree structure

北战南征 提交于 2019-12-11 01:49:38

问题


Using pandas 0.20.3 I am trying to sort the n multilevels of a dataframe by a column ('D') with values (descendlingly) such that the hierarchy of the groups is maintained.

Example input:

                    D
A     B     C
Gran1 Par1  Child1  3
            Child2  7
            Child3  2
      Par2  Child1  9
            Child2  2
      Par3  Child1  6
Gran2 Par1  Child1  3
      Par2  Child1  8
            Child2  2
            Child3  3
      Par3  Child1  6
            Child2  8

Desired result:

                    D
A     B     C
Gran2 Par3  Child2  8
            Child1  6
      Par2  Child1  8
            Child3  3
            Child2  2
      Par1  Child1  3
Gran1 Par1  Child2  7
            Child1  3
            Child3  2
      Par2  Child1  9
            Child2  2
      Par3  Child1  6

Solutions to other problems related to sorting and ordering multilevel indices, seem to be focussed on sorting the actual level of the index or maintaining it in order while sorting a column. I did not find a multilevel sort where the values of the columns are used to sort the index by the aggregate value at that specific level. Any suggestions are greatly appreciated.


回答1:


You need to create three separate arrays and sort by the combination of all them. In this example, I use Numpy's np.lexsort to do the sorting and then I use iloc to respect that sort. At the end, I use a[::-1] to get the reverse sort.

a = np.lexsort([
    df.D.values,
    df.groupby(level=[0, 1]).D.transform('sum').values,
    df.groupby(level=0).D.transform('sum').values
])

df.iloc[a[::-1]]

                   D
A     B    C        
Gran2 Par3 Child2  8
           Child1  6
      Par2 Child1  8
           Child3  3
           Child2  2
      Par1 Child1  3
Gran1 Par1 Child2  7
           Child1  3
           Child3  2
      Par2 Child1  9
           Child2  2
      Par3 Child1  6



回答2:


Need reset_index for columns from MultiIndex, then transform for sum values and then sort_values and last set_index:

df = df.reset_index()
df['G'] = df.groupby(['A','B'])['D'].transform('sum')

df = df.sort_values(['A','G','D'], ascending=False).drop('G',1).set_index(['A','B','C'])
print (df)

                   D
A     B    C        
Gran2 Par3 Child2  8
           Child1  6
      Par2 Child1  8
           Child3  3
           Child2  2
      Par1 Child1  3
Gran1 Par1 Child2  7
           Child1  3
           Child3  2
      Par2 Child1  9
           Child2  2
      Par3 Child1  6


来源:https://stackoverflow.com/questions/47378149/python-pandas-sorting-multiindex-by-column-but-retain-tree-structure

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!