What is the difference between using loc and using just square brackets to filter for columns in Pandas/Python?

后端未结

关注

 3  1502

I\'ve noticed three methods of selecting a column in a Pandas DataFrame:

First method of selecting a column using loc:

df_new = df.l


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  时光说笑        
                
              
                            
                2020-12-02 05:10
              
            
            
                                                                       
In the following situations, they behave the same:


Selecting a single column (df['A'] is the same as df.loc[:, 'A'] -> selects column A)
Selecting a list of columns (df[['A', 'B', 'C']] is the same as df.loc[:, ['A', 'B', 'C']] -> selects columns A, B and C)
Slicing by rows (df[1:3] is the same as df.iloc[1:3] -> selects rows 1 and 2. Note, however, if you slice rows with loc, instead of iloc, you'll get rows 1, 2 and 3 assuming you have a RandeIndex. See details here.)  


However, [] does not work in the following situations:


You can select a single row with df.loc[row_label] 
You can select a list of rows with df.loc[[row_label1, row_label2]] 
You can slice columns with df.loc[:, 'A':'C']


These three cannot be done with [].
More importantly, if your selection involves both rows and columns, then assignment becomes problematic.

df[1:3]['A'] = 5


This selects rows 1 and 2, and then selects column 'A' of the returning object and assign value 5 to it. The problem is, the returning object might be a copy so this may not change the actual DataFrame. This raises  SettingWithCopyWarning. The correct way of this assignment is

df.loc[1:3, 'A'] = 5


With .loc, you are guaranteed to modify the original DataFrame. It also allows you to slice columns (df.loc[:, 'C':'F']), select a single row (df.loc[5]), and select a list of rows (df.loc[[1, 2, 5]]). 

Also note that these two were not included in the API at the same time. .loc was added much later as a more powerful and explicit indexer. See unutbu's answer for more detail.



Note: Getting columns with [] vs . is a completely different topic. . is only there for convenince. It only allows accessing columns whose name are valid Python identifier (i.e. they cannot contain spaces, they cannot be composed of numbers...). It cannot be used when the names conflict with Series/DataFrame methods. It also cannot be used for non-existing columns (i.e. the assignment df.a = 1 won't work if there is no column a). Other than that, . and [] are the same.  
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  无人共我        
                
              
                            
                2020-12-02 05:15
              
            
            
                                                                       
loc is specially useful when the index is not numeric (e.g. a DatetimeIndex) because you can get rows with particular labels from the index:

df.loc['2010-05-04 07:00:00']
df.loc['2010-1-1 0:00:00':'2010-12-31 23:59:59 ','Price']


However [] is intended to get columns with particular names:

df['Price']


With [] you can also filter rows, but it is more elaborated:

df[df['Date'] < datetime.datetime(2010,1,1,7,0,0)]['Price']

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  名媛妹妹        
                
              
                            
                2020-12-02 05:17
              
            
            
                                                                       
There seems to be a difference between df.loc[] and df[] when you create dataframe with multiple columns. 

You can refer to this question:
Is there a nice way to generate multiple columns using .loc?

Here, you can't generate multiple columns using df.loc[:,['name1','name2']] but you can do by just using double bracket df[['name1','name2']]. (I wonder why they behave differently.)
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复