Pandas multiIndex is entirely copied to a dataframe slice

前端未结

关注

 1  470

I think there is a conceptual bug in the way multiIndex is created on a dataframe slice. Consider the following code:

import cufflinks as cf
df=cf.datagen.li


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  失恋的感觉        
                
              
                            
                2020-12-11 09:41
              
            
            
                                                                       
You need remove_unused_levels what is new functionality in pandas 0.20.0, you can also check docs:

new_df.columns.remove_unused_levels()


Sample:

np.random.seed(23)
cols = pd.MultiIndex.from_tuples([('Iter1','a'), ('Iter1','b'),
                                     ('Iter2','c'), ('Iter2','d'),
                                     ('Iter3','e'), ('Iter3','f')])
idx = pd.date_range('2015-01-01', periods=5)
df = pd.DataFrame(np.random.rand(5,6), columns=cols, index=idx)
print (df)
               Iter1               Iter2               Iter3          
                   a         b         c         d         e         f
2015-01-01  0.517298  0.946963  0.765460  0.282396  0.221045  0.686222
2015-01-02  0.167139  0.392442  0.618052  0.411930  0.002465  0.884032
2015-01-03  0.884948  0.300410  0.589582  0.978427  0.845094  0.065075
2015-01-04  0.294744  0.287934  0.822466  0.626183  0.110478  0.000529
2015-01-05  0.942166  0.141501  0.421597  0.346489  0.869785  0.428602




new_df = df[['Iter1','Iter2']].copy()
print (new_df)
               Iter1               Iter2          
                   a         b         c         d
2015-01-01  0.517298  0.946963  0.765460  0.282396
2015-01-02  0.167139  0.392442  0.618052  0.411930
2015-01-03  0.884948  0.300410  0.589582  0.978427
2015-01-04  0.294744  0.287934  0.822466  0.626183
2015-01-05  0.942166  0.141501  0.421597  0.346489

print (new_df.columns)
MultiIndex(levels=[['Iter1', 'Iter2', 'Iter3'], ['a', 'b', 'c', 'd', 'e', 'f']],
           labels=[[0, 0, 1, 1], [0, 1, 2, 3]])

print (new_df.columns.remove_unused_levels())
MultiIndex(levels=[['Iter1', 'Iter2'], ['a', 'b', 'c', 'd']],
           labels=[[0, 0, 1, 1], [0, 1, 2, 3]])

new_df.columns = new_df.columns.remove_unused_levels()

print (new_df.columns)
MultiIndex(levels=[['Iter1', 'Iter2'], ['a', 'b', 'c', 'd']],
           labels=[[0, 0, 1, 1], [0, 1, 2, 3]])

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复