Apply vs transform on a group object

前端未结

关注

 4  1415

别跟我提以往 2020-11-22 15:04

Consider the following dataframe:

     A      B         C         D
0  foo    one  0.162003  0.087469
1  bar    one -1.156319 -1.526272
2  foo    two  0.8338


      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   时光取名叫无心
                                             
                
                
                (楼主)
            
              
              
                2020-11-22 15:14
              

            
            
                        
As I felt similarly confused with .transform operation vs. .apply I found a few answers shedding some light on the issue. This answer for example was very helpful.

My takeout so far is that .transform will work (or deal) with Series (columns) in isolation from each other. What this means is that in your last two calls:

df.groupby('A').transform(lambda x: (x['C'] - x['D']))
df.groupby('A').transform(lambda x: (x['C'] - x['D']).mean())


You asked .transform to take values from two columns and 'it' actually does not 'see' both of them at the same time (so to speak). transform will look at the dataframe columns one by one and return back a series (or group of series) 'made' of scalars which are repeated len(input_column) times.

So this scalar, that should be used by .transform to make the Series is a result of some reduction function applied on an input Series (and only on ONE series/column at a time).

Consider this example (on your dataframe):

zscore = lambda x: (x - x.mean()) / x.std() # Note that it does not reference anything outside of 'x' and for transform 'x' is one column.
df.groupby('A').transform(zscore)


will yield:

       C      D
0  0.989  0.128
1 -0.478  0.489
2  0.889 -0.589
3 -0.671 -1.150
4  0.034 -0.285
5  1.149  0.662
6 -1.404 -0.907
7 -0.509  1.653


Which is exactly the same as if you would use it on only on one column at a time:

df.groupby('A')['C'].transform(zscore)


yielding:

0    0.989
1   -0.478
2    0.889
3   -0.671
4    0.034
5    1.149
6   -1.404
7   -0.509


Note that .apply in the last example (df.groupby('A')['C'].apply(zscore)) would work in exactly the same way, but it would fail if you tried using it on a dataframe:

df.groupby('A').apply(zscore)


gives error:

ValueError: operands could not be broadcast together with shapes (6,) (2,)


So where else  is .transform useful? The simplest case is trying to assign results of reduction function back to original dataframe.

df['sum_C'] = df.groupby('A')['C'].transform(sum)
df.sort('A') # to clearly see the scalar ('sum') applies to the whole column of the group


yielding:

     A      B      C      D  sum_C
1  bar    one  1.998  0.593  3.973
3  bar  three  1.287 -0.639  3.973
5  bar    two  0.687 -1.027  3.973
4  foo    two  0.205  1.274  4.373
2  foo    two  0.128  0.924  4.373
6  foo    one  2.113 -0.516  4.373
7  foo  three  0.657 -1.179  4.373
0  foo    one  1.270  0.201  4.373


Trying the same with .apply would give NaNs in sum_C.
Because .apply would return a reduced Series, which it does not know how to broadcast back:

df.groupby('A')['C'].apply(sum)


giving:

A
bar    3.973
foo    4.373


There are also cases when .transform is used to filter the data:

df[df.groupby(['B'])['D'].transform(sum) < -1]

     A      B      C      D
3  bar  three  1.287 -0.639
7  foo  three  0.657 -1.179


I hope this adds a bit more clarity.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复