问题
I'm wondering how I should approach this data manipulation predicament. What is the best method to sort an index of a multi-index in a data frame where the values of on level of the index is alphanumeric. The values are:
[u'0', u'1', u'10', u'11', u'2', u'2Y', u'3', u'3Y', u'4', u'4Y', u'5', u'5Y', u'6', u'7', u'8', u'9', u'9Y']
The result I'm searching for is:
[u'0', u'1', u'2', u'3', u'4', u'5', u'6', u'7', u'8', u'9', u'10', u'11', u'2Y', u'3Y', u'4Y', u'5Y', u'9Y']
The plain numeric values stand for months while the integer plus 'Y' stand for years.
Is there a way to sort the index?
Duration - is one level of the multi index, the second is sum. Please find a sample dataset below:
Duration 2 2Y 3 3Y
customer
Invoice A 25.50 0.00 0.00 20.00
Invoice B 50.00 25.00 -10.50 0.00
Invoice C 125.00 0.00 11.20 0.50
Invoice D 0.00 15.00 0.00 80.10
回答1:
You can use the natsort package to naturally sort your columns. Here's an example:
import natsort as ns
c = ['0', '1', '10', ...]
c = sorted(ns.natsorted(c), key=lambda x: not x.isdigit())
print(c)
['0',
'1',
'2',
'3',
'4',
'5',
'6',
'7',
'8',
'9',
'10',
'11',
'2Y',
'3Y',
'4Y',
'5Y',
'9Y']
For your problem, a similar approach follows with reindex_axis as the extra step:
c = df.columns.levels[1]
c = sorted(ns.natsorted(c), key=str.isdigit, reverse=True)
df = df.reindex_axis(pd.MultiIndex.from_product([df.columns.levels[0], c]), axis=1)
来源:https://stackoverflow.com/questions/47239950/data-manipulation-sort-index-when-values-are-alphanumeric