Data Manipulation - Sort Index when values are Alphanumeric

人盡茶涼 提交于 2019-12-31 04:14:07

问题


I'm wondering how I should approach this data manipulation predicament. What is the best method to sort an index of a multi-index in a data frame where the values of on level of the index is alphanumeric. The values are:

[u'0', u'1', u'10', u'11', u'2', u'2Y', u'3', u'3Y', u'4', u'4Y', u'5', u'5Y', u'6', u'7', u'8', u'9', u'9Y']

The result I'm searching for is:

[u'0', u'1', u'2', u'3', u'4', u'5', u'6', u'7', u'8', u'9', u'10', u'11', u'2Y', u'3Y', u'4Y', u'5Y', u'9Y']

The plain numeric values stand for months while the integer plus 'Y' stand for years.

Is there a way to sort the index?

Duration - is one level of the multi index, the second is sum. Please find a sample dataset below:

Duration                            2          2Y         3         3Y   
customer                                                                     
Invoice A                         25.50        0.00      0.00       20.00   
Invoice B                         50.00        25.00     -10.50     0.00
Invoice C                         125.00       0.00      11.20      0.50
Invoice D                         0.00        15.00      0.00       80.10

回答1:


You can use the natsort package to naturally sort your columns. Here's an example:

import natsort as ns

c =  ['0', '1', '10', ...]
c = sorted(ns.natsorted(c), key=lambda x: not x.isdigit())

print(c)
['0',
 '1',
 '2',
 '3',
 '4',
 '5',
 '6',
 '7',
 '8',
 '9',
 '10',
 '11',
 '2Y',
 '3Y',
 '4Y',
 '5Y',
 '9Y']

For your problem, a similar approach follows with reindex_axis as the extra step:

c = df.columns.levels[1]
c = sorted(ns.natsorted(c), key=str.isdigit, reverse=True)

df = df.reindex_axis(pd.MultiIndex.from_product([df.columns.levels[0], c]), axis=1)


来源:https://stackoverflow.com/questions/47239950/data-manipulation-sort-index-when-values-are-alphanumeric

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!