Multi-Index Sorting in Pandas

后端 未结 5 1763
暖寄归人
暖寄归人 2020-12-13 02:58

I have a multi-index DataFrame created via a groupby operation. I\'m trying to do a compound sort using several levels of the index, but I can\'t seem to find a sort functi

相关标签:
5条回答
  • 2020-12-13 03:23

    If you want try to avoid multiple swaps within a very deep MultiIndex, you also could try with this:

    1. Slicing by level X (by list comprehension + .loc + IndexSlice)
    2. Sort the desired level (sortlevel(2))
    3. Concatenate every group of level X indexes

    Here you have the code:

    import pandas as pd
    idx = pd.IndexSlice
    g = pd.concat([grouped.loc[idx[i,:,:],:].sortlevel(2) for i in grouped.index.levels[0]])
    g
    
    0 讨论(0)
  • 2020-12-13 03:24

    A hack would be to change the order of the levels:

    In [11]: g
    Out[11]:
                                                   Sales
    Manufacturer Product Name Product Launch Date
    Apple        iPad         2010-04-03              30
                 iPod         2001-10-23              34
    Samsung      Galaxy       2009-04-27              24
                 Galaxy Tab   2010-09-02              22
    
    In [12]: g.index = g.index.swaplevel(1, 2)
    

    Sortlevel, which (as you've found) sorts the MultiIndex levels in order:

    In [13]: g = g.sortlevel()
    

    And swap back:

    In [14]: g.index = g.index.swaplevel(1, 2)
    
    In [15]: g
    Out[15]:
                                                   Sales
    Manufacturer Product Name Product Launch Date
    Apple        iPod         2001-10-23              34
                 iPad         2010-04-03              30
    Samsung      Galaxy       2009-04-27              24
                 Galaxy Tab   2010-09-02              22
    

    I'm of the opinion that sortlevel should not sort the remaining labels in order, so will create a github issue. :) Although it's worth mentioning the docnote about "the need for sortedness".

    Note: you could avoid the first swaplevel by reordering the order of the initial groupby:

    g = df.groupby(['Manufacturer', 'Product Launch Date', 'Product Name']).sum()
    
    0 讨论(0)
  • 2020-12-13 03:25

    To sort a MultiIndex by the "index columns" (aka. levels) you need to use the .sort_index() method and set its level argument. If you want to sort by multiple levels, the argument needs to be set to a list of level names in sequential order.

    This should give you the DataFrame you need:

    df.groupby(['Manufacturer',
                'Product Name', 
                'Launch Date']
              ).sum().sort_index(level=['Manufacturer','Launch Date'])
    
    0 讨论(0)
  • 2020-12-13 03:28

    If you are not concerned about conserving the index (I often prefer an arbitrary integer index) you can just use the following one-liner:

    grouped.reset_index().sort(["Manufacturer","Product Launch Date"])
    
    0 讨论(0)
  • 2020-12-13 03:32

    This one liner works for me:

    In [1]: grouped.sortlevel(["Manufacturer","Product Launch Date"], sort_remaining=False)
    
                                                   Sales
    Manufacturer Product Name Product Launch Date       
    Apple        iPod         2001-10-23              34
                 iPad         2010-04-03              30
    Samsung      Galaxy       2009-04-27              24
                 Galaxy Tab   2010-09-02              22
    

    Note this works too:

    groups.sortlevel([0,2], sort_remaining=False)
    

    This wouldn't have worked when you originally posted over two years ago, because sortlevel by default sorted on ALL indices which mucked up your company hierarchy. sort_remaining which disables that behavior was added last year. Here's the commit link for reference: https://github.com/pydata/pandas/commit/3ad64b11e8e4bef47e3767f1d31cc26e39593277

    0 讨论(0)
提交回复
热议问题