Applying a custom groupby aggregate function to find average of Numpy Array

前端未结

关注

 2  1266

I am having a pandas DataFrame where B contains NumPy list of fixed size.

|------|---------------|-------|
|  A   |       B       |   C   |
|------|--------


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  后悔当初        
                
              
                            
                2020-12-12 04:18
              
            
            
                                                                       
What you need is possible with convert values to 2d array and then using np.mean:

f = lambda x: np.mean(np.array(x.tolist()), axis=0)
df2 = df.groupby('C')['B'].apply(f).reset_index()
print (df2)
   C                     B
0  X  [1.5, 2.5, 4.0, 5.0]
1  Y  [2.0, 3.0, 4.0, 4.0]
2  Z  [2.0, 3.0, 5.0, 6.0]


Last option solution is possible, but less effient (thank you @Abhik Sarkar for test):

df1 = pd.DataFrame(df.B.tolist()).groupby(df['C']).mean()
df2 = pd.DataFrame({'B': df1.values.tolist(), 'C': df1.index})
print (df2)
                      B  C
0  [1.5, 2.5, 4.0, 5.0]  X
1  [2.0, 3.0, 4.0, 4.0]  Y
2  [2.0, 3.0, 5.0, 6.0]  Z

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  误落风尘        
                
              
                            
                2020-12-12 04:43
              
            
            
                                                                       
Dummy data

size,list_size = 10,5
data = [{'C':random.randint(95,100), 
         'B':[random.randint(0,10) for i in range(list_size)]} for j in range(size)]
df = pd.DataFrame(data)


Custom Aggregation Using numpy 

unique_C = df.C.unique()
data_calculated  = []
axis = 0

for c in unique_C:
    arr = np.reshape(np.hstack(df[df.C==c]['B']),(-1,list_size))
    mean, std = arr.mean(axis=axis), arr.std(axis=axis)  # other aggergation can also be added
    data_calculated.append(dict(C=t,B_mean=mean, B_std=std))
new_df = pd.DataFrame(data_calculated)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复