问题
Below is my dataframe:
In [2804]: df = pd.DataFrame({'A':[1,2,3,4,5,6], 'D':[{"value": '126', "perc": None, "unit": None}, {"value": 324, "perc": None, "unit": None}, {"value": 'N/A', "perc": None, "unit": None}, {}, {"value": '100', "perc": None, "unit": None}, np.nan]})
In [2794]: df.columns = pd.MultiIndex.from_product([df.columns, ['E']])
In [2807]: df
Out[2807]:
A D
E E
0 1 {'value': '126', 'perc': None, 'unit': None}
1 2 {'value': 324, 'perc': None, 'unit': None}
2 3 {'value': 'N/A', 'perc': None, 'unit': None}
3 4 {}
4 5 {'value': '100', 'perc': None, 'unit': None}
5 6 NaN
I need to sort the multi-level column with index (D,E) in descending order based on value key from dict.
As you can see value key can have values in mixed datatypes like int, string or empty like {}, or NaN.
N/A and Nan values should always appear at last after sorting(both asc and desc).
Expected output:
In [2814]: df1 = pd.DataFrame({'A':[2,1,5,3,4,6], 'D':[{"value": 324, "perc": None, "unit": None}, {"value": '126', "perc": None, "unit": None}, {"value": '100', "perc": None, "unit": None}, {"value": 'N/A', "perc": None, "unit": None}, {},np.nan]})
In [2799]: df1.columns = pd.MultiIndex.from_product([df1.columns, ['E']])
In [2811]: df1
Out[2811]:
A D
E E
0 2 {'value': 324, 'perc': None, 'unit': None}
1 1 {'value': '126', 'perc': None, 'unit': None}
2 5 {'value': '100', 'perc': None, 'unit': None}
3 3 {'value': 'N/A', 'perc': None, 'unit': None}
4 4 {}
5 6 NaN
回答1:
Create helper column filled by numeric and sorting by this column:
df['tmp'] = pd.to_numeric(df[('D','E')].str.get('value'), errors='coerce')
df1 = df.sort_values('tmp', ascending=False).drop('tmp', axis=1)
print (df1)
A D
E E
1 2 {'value': 324, 'perc': None, 'unit': None}
0 1 {'value': '126', 'perc': None, 'unit': None}
4 5 {'value': '100', 'perc': None, 'unit': None}
2 3 {'value': 'N/A', 'perc': None, 'unit': None}
3 4 {}
5 6 NaN
df1 = df.sort_values('tmp').drop('tmp', axis=1)
print (df1)
A D
E E
4 5 {'value': '100', 'perc': None, 'unit': None}
0 1 {'value': '126', 'perc': None, 'unit': None}
1 2 {'value': 324, 'perc': None, 'unit': None}
2 3 {'value': 'N/A', 'perc': None, 'unit': None}
3 4 {}
5 6 NaN
来源:https://stackoverflow.com/questions/64571500/pandas-sort-a-multiindex-dataframes-multi-level-column-with-mixed-datatypes