How to speed up Pandas multilevel dataframe shift by group?

后端未结

关注

 4  1914

离开以前 2021-01-22 03:36

I am trying to shift the Pandas dataframe column data by group of first index. Here is the demo code:

 In [8]: df = mul_df(5,4,3)

In [9]: df
Out[9]:


      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   半阙折子戏
                                             
                
                
                (楼主)
            
              
              
                2021-01-22 04:02
              

            
            
                        
similar question and added answer with that works for shift in either direction and magnitude: pandas: setting last N rows of multi-index to Nan for speeding up groupby with shift

Code (including test setup) is:

#
# the function to use in apply
#
def replace_shift_overlap(grp,col,N,value):
    if (N > 0):
        grp[col][:N] = value
    else:
        grp[col][N:] = value
    return grp


length = 5
groups = 3
rng1 = pd.date_range('1/1/1990', periods=length, freq='D')
frames = []
for x in xrange(0,groups):
    tmpdf = pd.DataFrame({'date':rng1,'category':int(10000000*abs(np.random.randn())),'colA':np.random.randn(length),'colB':np.random.randn(length)})
    frames.append(tmpdf)
df = pd.concat(frames)

df.sort(columns=['category','date'],inplace=True)
df.set_index(['category','date'],inplace=True,drop=True)
shiftBy=-1
df['tmpShift'] = df['colB'].shift(shiftBy)

# 
# the apply
#
df = df.groupby(level=0).apply(replace_shift_overlap,'tmpShift',shiftBy,np.nan)

# Yay this is so much faster.
df['newColumn'] = df['tmpShift'] / df['colA']
df.drop('tmpShift',1,inplace=True)


EDIT: Note that the initial sort really eats into the effectiveness of this. So in some cases the original answer is more effective. 
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复