What is the equivalent of SQL “GROUP BY HAVING” on Pandas?

前端未结

关注

 2  475

what would be the most efficient way to use groupby and in parallel apply a filter in pandas?

Basically I am asking for the equivalent in SQL of

sele


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  执念已碎        
                
              
                            
                2020-12-05 02:09
              
            
            
                                                                       
As mentioned in unutbu's comment, groupby's filter is the equivalent of SQL'S HAVING:

In [11]: df = pd.DataFrame([[1, 2], [1, 3], [5, 6]], columns=['A', 'B'])

In [12]: df
Out[12]:
   A  B
0  1  2
1  1  3
2  5  6

In [13]: g = df.groupby('A')  #  GROUP BY A

In [14]: g.filter(lambda x: len(x) > 1)  #  HAVING COUNT(*) > 1
Out[14]:
   A  B
0  1  2
1  1  3


You can write more complicated functions (these are applied to each group), provided they return a plain ol' bool:

In [15]: g.filter(lambda x: x['B'].sum() == 5)
Out[15]:
   A  B
0  1  2
1  1  3


Note: potentially there is a bug where you can't write you function to act on the columns you've used to groupby... a workaround is the groupby the columns manually i.e. g = df.groupby(df['A'])).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  -上瘾入骨i        
                
              
                            
                2020-12-05 02:18
              
            
            
                                                                       
I group by state and county where max is greater than 20 then subquery the resulting values for True using the dataframe loc

counties=df.groupby(['state','county'])['field1'].max()>20
counties=counties.loc[counties.values==True]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复