Pandas multiIndex is entirely copied to a dataframe slice

前端 未结 1 465
-上瘾入骨i
-上瘾入骨i 2020-12-11 09:16

I think there is a conceptual bug in the way multiIndex is created on a dataframe slice. Consider the following code:

import cufflinks as cf
df=cf.datagen.li         


        
相关标签:
1条回答
  • 2020-12-11 09:41

    You need remove_unused_levels what is new functionality in pandas 0.20.0, you can also check docs:

    new_df.columns.remove_unused_levels()
    

    Sample:

    np.random.seed(23)
    cols = pd.MultiIndex.from_tuples([('Iter1','a'), ('Iter1','b'),
                                         ('Iter2','c'), ('Iter2','d'),
                                         ('Iter3','e'), ('Iter3','f')])
    idx = pd.date_range('2015-01-01', periods=5)
    df = pd.DataFrame(np.random.rand(5,6), columns=cols, index=idx)
    print (df)
                   Iter1               Iter2               Iter3          
                       a         b         c         d         e         f
    2015-01-01  0.517298  0.946963  0.765460  0.282396  0.221045  0.686222
    2015-01-02  0.167139  0.392442  0.618052  0.411930  0.002465  0.884032
    2015-01-03  0.884948  0.300410  0.589582  0.978427  0.845094  0.065075
    2015-01-04  0.294744  0.287934  0.822466  0.626183  0.110478  0.000529
    2015-01-05  0.942166  0.141501  0.421597  0.346489  0.869785  0.428602
    

    new_df = df[['Iter1','Iter2']].copy()
    print (new_df)
                   Iter1               Iter2          
                       a         b         c         d
    2015-01-01  0.517298  0.946963  0.765460  0.282396
    2015-01-02  0.167139  0.392442  0.618052  0.411930
    2015-01-03  0.884948  0.300410  0.589582  0.978427
    2015-01-04  0.294744  0.287934  0.822466  0.626183
    2015-01-05  0.942166  0.141501  0.421597  0.346489
    
    print (new_df.columns)
    MultiIndex(levels=[['Iter1', 'Iter2', 'Iter3'], ['a', 'b', 'c', 'd', 'e', 'f']],
               labels=[[0, 0, 1, 1], [0, 1, 2, 3]])
    
    print (new_df.columns.remove_unused_levels())
    MultiIndex(levels=[['Iter1', 'Iter2'], ['a', 'b', 'c', 'd']],
               labels=[[0, 0, 1, 1], [0, 1, 2, 3]])
    
    new_df.columns = new_df.columns.remove_unused_levels()
    
    print (new_df.columns)
    MultiIndex(levels=[['Iter1', 'Iter2'], ['a', 'b', 'c', 'd']],
               labels=[[0, 0, 1, 1], [0, 1, 2, 3]])
    
    0 讨论(0)
提交回复
热议问题