How to apply “first” and “last” functions to columns while using group by in pandas?

后端未结

关注

 4  611

I have a data frame and I would like to group it by a particular column (or, in other words, by values from a particular column). I can do it in the following way: gro


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  南旧        
                
              
                            
                2020-12-08 07:09
              
            
            
                                                                       
Instead of using first or last, use their string representations in the agg method. For example on the OP's case:

grouped = df.groupby(['ColumnName'])
grouped['D'].agg({'result1' : np.sum, 'result2' : np.mean})

#you can do the string representation for first and last
grouped['D'].agg({'result1' : 'first', 'result2' : 'last'})

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  青春惊慌失措        
                
              
                            
                2020-12-08 07:10
              
            
            
                                                                       
I would use a custom aggregator as shown below.

d = pd.DataFrame([[1,"man"], [1, "woman"], [1, "girl"], [2,"man"], [2, "woman"]],columns = 'number family'.split())
d


Here is the output:

    number family
 0       1    man
 1       1  woman
 2       1   girl
 3       2    man
 4       2  woman


Now the Aggregation taking first and last elements.

d.groupby(by = "number").agg(firstFamily= ('family', lambda x: list(x)[0]), lastFamily =('family', lambda x: list(x)[-1]))


The output of this aggregation is shown below.

       firstFamily lastFamily
number                       
1              man       girl
2              man      woman


I hope this helps.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一向        
                
              
                            
                2020-12-08 07:14
              
            
            
                                                                       
I'm not sure if this is really the issue, but sum and min are Python built-ins that take some iterables as input, whereas first is a method of pandas Series object, so maybe it's not in your namespace. Moreover it takes something else as an input (the doc says some offset value). 

I guess one way to get around it is to create your own first function, and define it such that it takes a Series object as an input, e.g.:

def first(Series, offset):
    return Series.first(offset)


or something like that..
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  情话喂你        
                
              
                            
                2020-12-08 07:18
              
            
            
                                                                       
I think the issue is that there are two different first methods which share a name but act differently, one is for groupby objects and another for a Series/DataFrame (to do with timeseries).

To replicate the behaviour of the groupby first method over a DataFrame using agg you could use iloc[0] (which gets the first row in each group (DataFrame/Series) by index):

grouped.agg(lambda x: x.iloc[0])


For example:

In [1]: df = pd.DataFrame([[1, 2], [3, 4]])

In [2]: g = df.groupby(0)

In [3]: g.first()
Out[3]: 
   1
0   
1  2
3  4

In [4]: g.agg(lambda x: x.iloc[0])
Out[4]: 
   1
0   
1  2
3  4


Analogously you can replicate last using iloc[-1].

Note: This will works column-wise, et al:

g.agg({1: lambda x: x.iloc[0]})


In older version of pandas you could would use the irow method (e.g. x.irow(0), see previous edits.



A couple of updated notes:

This is better done using the nth groupby method, which is much faster >=0.13:

g.nth(0)  # first
g.nth(-1)  # last


You have to take care a little, as the default behaviour for first and last ignores NaN rows... and IIRC for DataFrame groupbys it was broken pre-0.13... there's a dropna option for nth.

You can use the strings rather than built-ins (though IIRC pandas spots it's the sum builtin and applies np.sum):

grouped['D'].agg({'result1' : "sum", 'result2' : "mean"})

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复