python pandas data frame if else without iterating thought data frame

前端未结

关注

 3  1919

I want to add a column to a df. The values of this new df will be dependent upon the values of the other columns. eg

dc = {\'A\':[0,9,4,5],\'B\':[6,0,10,12],


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  灰色年华        
                
              
                            
                2020-12-15 02:09
              
            
            
                                                                       
here's a start:

df['D'] = np.nan
df['D'].loc[df[(df.A != 0) & (df.B != 0)].index] = df.A / df.B.astype(np.float) * df.C


edit, you should probably just go ahead and cast the whole thing to floats unless you really care about integers for some reason:

df = df.astype(np.float)


and then you don't have to constantly keep converting in call itself
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  别跟我提以往        
                
              
                            
                2020-12-15 02:17
              
            
            
                                                                       
apply should work well for you:

In [20]: def func(row):
            if (row == 0).all():
                return 250.0
            elif (row[['A', 'B']] != 0).all():
                return (float(row['A']) / row['B'] ) * row['C']
            else:
                return 20
       ....:     


In [21]: df['D'] = df.apply(func, axis=1)

In [22]: df
Out[22]: 
   A   B   C     D
0  0   6   1  20.0
1  9   0   3  20.0
2  4  10  15   6.0
3  5  12  18   7.5

[4 rows x 4 columns]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  迷失自我        
                
              
                            
                2020-12-15 02:26
              
            
            
                                                                       
.where can be much faster than .apply, so if all you're doing is if/elses then I'd aim for .where. As you're returning scalars in some cases, np.where will be easier to use than pandas' own .where.

import pandas as pd
import numpy as np
df['D'] = np.where((df.A!=0) & (df.B!=0), ((df.A/df.B)*df.C),
          np.where((df.C==0) & (df.A!=0) & (df.B==0), 250,
          20))

   A   B   C     D
0  0   6   1  20.0
1  9   0   3  20.0
2  4  10  15   6.0
3  5  12  18   7.5


For a tiny df like this, you wouldn't need to worry about speed. However, on a 10000 row df of randn, this is almost 2000 times faster than the .apply solution above: 3ms vs 5850ms. That said if speed isn't a concern, then .apply can often be easier to read.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复