Index sort order of a multi-index dataframe does not respect categorical index order

断了今生、忘了曾经 提交于 2021-02-09 08:34:57

问题


A small dataframe with a two level multiindex and one column. The second column(level 1) of the index will sort in alphabetical order putting 'Four' before 'Three'.

import pandas as pd
df = pd.DataFrame({'A':[1,1,2,2],
  'B':['One','Two','Three', 'Four'], 
  'X':[1,2,3,4]},
  index=range(4)).set_index(['A','B']).sort_index()
df

         X
A B       
1 One    1
  Two    2
2 Four   4
  Three  3

Clearly the second level of the index (B) is in alphabetical order so this can be replaced with a categorical index to force the correct ordering.

df.index.set_levels(pd.CategoricalIndex(df.index.levels[1], 
       categories=['One','Two','Three', 'Four'], ordered=True), 
    level=1, inplace=True)

With this done inspecting the index shows that level 1 is indeed a categorical index. But sorting the index does not put the rows in the desired order.

df.sort_index()

         X
A B       
1 One    1
  Two    2
2 Four   4
  Three  3

Note: If the the dataframe has a simple index of 1 level only this works as expected.


回答1:


I managed to get this by setting the index after the dataframe has been created - not sure if this is the best answer but it's an answer:

df = pd.DataFrame({'A':[1,1,2,2],
   'B':['One','Two','Three', 'Four'], 
   'X':[1,2,3,4]})
df = df.set_index(['A', pd.CategoricalIndex(df['B'], categories=['One','Two','Three', 'Four'], ordered=True)])
del df['B']


来源:https://stackoverflow.com/questions/49318345/index-sort-order-of-a-multi-index-dataframe-does-not-respect-categorical-index-o

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!