Pandas hierarchical sort

亡梦爱人 提交于 2019-12-11 04:17:40

问题


I have a dataframe of categories and amounts. Categories can be nested into sub categories an infinite levels using a colon separated string. I wish to sort it by descending amount. But in hierarchical type fashion like shown.

How I need it sorted

CATEGORY                            AMOUNT
Transport                           5000
Transport : Car                     4900
Transport : Train                   100
Household                           1100
Household : Utilities               600
Household : Utilities : Water       400
Household : Utilities : Electric    200
Household : Cleaning                100
Household : Cleaning : Bathroom     75
Household : Cleaning : Kitchen      25
Household : Rent                    400
Living                              250
Living : Other                      150
Living : Food                       100

EDIT: The data frame:

pd.DataFrame({
    "category": ["Transport", "Transport : Car", "Transport : Train", "Household", "Household : Utilities", "Household : Utilities : Water", "Household : Utilities : Electric", "Household : Cleaning", "Household : Cleaning : Bathroom", "Household : Cleaning : Kitchen", "Household : Rent", "Living", "Living : Other", "Living : Food"],
    "amount": [5000, 4900, 100, 1100, 600, 400, 200, 100, 75, 25, 400, 250, 150, 100]
})

Note: this is the order I want it. It may be in any arbitrary order before the sort.


回答1:


To answer my own question: I found a way. Kind of long winded but here it is.

import numpy as np
import pandas as pd


def sort_tree_df(df, tree_column, sort_column):
    sort_key = sort_column + '_abs'
    df[sort_key] = df[sort_column].abs()
    df.index = pd.MultiIndex.from_frame(
        df[tree_column].str.split(":").apply(lambda x: [y.strip() for y in x]).apply(pd.Series))
    sort_columns = [df[tree_column].values, df[sort_key].values] + [
        df.groupby(level=list(range(0, x)))[sort_key].transform('max').values
        for x in range(df.index.nlevels - 1, 0, -1)
    ]
    sort_indexes = np.lexsort(sort_columns)
    df_sorted = df.iloc[sort_indexes[::-1]]
    df_sorted.reset_index(drop=True, inplace=True)
    df_sorted.drop(sort_key, axis=1, inplace=True)
    return df_sorted


sort_tree_df(df, 'category', 'amount')



来源:https://stackoverflow.com/questions/58888948/pandas-hierarchical-sort

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!