numpy float: 10x slower than builtin in arithmetic operations?

前端未结

关注

 8  1056

别跟我提以往 2020-12-01 05:00

I am getting really weird timings for the following code:

import numpy as np
s = 0
for i in range(10000000):
    s += np.float64(1) # replace with np.float32


      
      
        
          8条回答        

        
                    
            
            
                         
                
              
              
                
                   甜味超标
                                             
                
                
                (楼主)
            
              
              
                2020-12-01 05:50
              

            
            
                        
Summary

If an arithmetic expression contains both numpy and built-in numbers, Python arithmetics works slower. Avoiding this conversion removes almost all of the performance degradation I reported.

Details

Note that in my original code:

s = np.float64(1)
for i in range(10000000):
  s = (s + 8) * s % 2399232


the types float and numpy.float64 are mixed up in one expression. Perhaps Python had to convert them all to one type?

s = np.float64(1)
for i in range(10000000):
  s = (s + np.float64(8)) * s % np.float64(2399232)


If the runtime is unchanged (rather than increased), it would suggest that's what Python indeed was doing under the hood, explaining the performance drag.

Actually, the runtime fell by 1.5 times! How is it possible? Isn't the worst thing that Python could possibly have to do was these two conversions? 

I don't really know. Perhaps Python had to dynamically check what needs to be converted into what, which takes time, and being told what precise conversions to perform makes it faster. Perhaps, some entirely different mechanism is used for arithmetics (which doesn't involve conversions at all), and it happens to be super-slow on mismatched types. Reading numpy source code might help, but it's beyond my skill.

Anyway, now we can obviously speed things up more by moving the conversions out of the loop:

q = np.float64(8)
r = np.float64(2399232)
for i in range(10000000):
  s = (s + q) * s % r


As expected, the runtime is reduced substantially: by another 2.3 times.

To be fair, we now need to change the float version slightly, by moving the literal constants out of the loop. This results in a tiny (10%) slowdown.

Accounting for all these changes, the np.float64 version of the code is now only 30% slower than the equivalent float version; the ridiculous 5-fold performance hit is largely gone.

Why do we still see the 30% delay? numpy.float64 numbers take the same amount of space as float, so that won't be the reason. Perhaps the resolution of the arithmetic operators takes longer for user-defined types. Certainly not a major concern.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它8个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复