Pandas interpolate within a groupby

后端未结

关注

 3  1862

误落风尘 2020-11-28 11:47

I\'ve got a dataframe with the following information:

    filename    val1    val2
t                   
1   file1.csv   5       10
2   file1.csv   NaN     Na


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   夕颜
                                             
                
                
                (楼主)
            
              
              
                2020-11-28 12:51
              

            
            
                        
I ran into this as well. Instead of using apply, you can use transform, which will reduce your run time by more than 25% if you have on the order of 1000 groups:

import numpy as np
import pandas as pd

np.random.seed(500)
test_df = pd.DataFrame({
    'a': np.random.randint(low=0, high=1000, size=10000),
    'b': np.random.choice([1, 2, 4, 7, np.nan], size=10000, p=([0.2475]*4 + [0.01]))
})


Tests:

%timeit test_df.groupby('a').transform(pd.DataFrame.interpolate)


Output: 566 ms ± 27.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit test_df.groupby('a').apply(pd.DataFrame.interpolate)


Output: 788 ms ± 10.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit test_df.groupby('a').apply(lambda group: group.interpolate())


Output: 787 ms ± 17.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit test_df.interpolate()


Output: 918 µs ± 16.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

You will still see a significant increase in run-time compared to a fully vectorized call to interpolate on the full DataFrame, but I don't think you can do much better in pandas.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复