问题
Using pandas 0.20.3 I am trying to sort the n multilevels of a dataframe by a column ('D') with values (descendlingly) such that the hierarchy of the groups is maintained.
Example input:
D
A B C
Gran1 Par1 Child1 3
Child2 7
Child3 2
Par2 Child1 9
Child2 2
Par3 Child1 6
Gran2 Par1 Child1 3
Par2 Child1 8
Child2 2
Child3 3
Par3 Child1 6
Child2 8
Desired result:
D
A B C
Gran2 Par3 Child2 8
Child1 6
Par2 Child1 8
Child3 3
Child2 2
Par1 Child1 3
Gran1 Par1 Child2 7
Child1 3
Child3 2
Par2 Child1 9
Child2 2
Par3 Child1 6
Solutions to other problems related to sorting and ordering multilevel indices, seem to be focussed on sorting the actual level of the index or maintaining it in order while sorting a column. I did not find a multilevel sort where the values of the columns are used to sort the index by the aggregate value at that specific level. Any suggestions are greatly appreciated.
回答1:
You need to create three separate arrays and sort by the combination of all them. In this example, I use Numpy's np.lexsort
to do the sorting and then I use iloc
to respect that sort. At the end, I use a[::-1]
to get the reverse sort.
a = np.lexsort([
df.D.values,
df.groupby(level=[0, 1]).D.transform('sum').values,
df.groupby(level=0).D.transform('sum').values
])
df.iloc[a[::-1]]
D
A B C
Gran2 Par3 Child2 8
Child1 6
Par2 Child1 8
Child3 3
Child2 2
Par1 Child1 3
Gran1 Par1 Child2 7
Child1 3
Child3 2
Par2 Child1 9
Child2 2
Par3 Child1 6
回答2:
Need reset_index for columns from MultiIndex
, then transform for sum
values and then sort_values and last set_index:
df = df.reset_index()
df['G'] = df.groupby(['A','B'])['D'].transform('sum')
df = df.sort_values(['A','G','D'], ascending=False).drop('G',1).set_index(['A','B','C'])
print (df)
D
A B C
Gran2 Par3 Child2 8
Child1 6
Par2 Child1 8
Child3 3
Child2 2
Par1 Child1 3
Gran1 Par1 Child2 7
Child1 3
Child3 2
Par2 Child1 9
Child2 2
Par3 Child1 6
来源:https://stackoverflow.com/questions/47378149/python-pandas-sorting-multiindex-by-column-but-retain-tree-structure