Why is subtraction faster than addition in Python?

后端未结

关注

 10  1395

佛祖请我去吃肉 2020-12-13 12:59

I was optimising some Python code, and tried the following experiment:

import time

start = time.clock()
x = 0
for i in range(10000000):
    x += 1
end = tim


      
      
        
          10条回答        

        
                    
            
            
                         
                
              
              
                
                   萌比男神i
                                             
                
                
                (楼主)
            
              
              
                2020-12-13 13:35
              

            
            
                        
I can reproduce this on my Q6600 (Python 2.6.2); increasing the range to 100000000:

('+=', 11.370000000000001)
('-=', 10.769999999999998)


First, some observations:


This is 5% for a trivial operation.  That's significant.
The speed of the native addition and subtraction opcodes is irrelevant.  It's in the noise floor, completely dwarfed by the bytecode evaluation.  That's talking about one or two native instructions around thousands.
The bytecode generates exactly the same number of instructions; the only difference is INPLACE_ADD vs. INPLACE_SUBTRACT and +1 vs -1.


Looking at the Python source, I can make a guess.  This is handled in ceval.c, in PyEval_EvalFrameEx.  INPLACE_ADD has a significant extra block of code, to handle string concatenation.  That block doesn't exist in INPLACE_SUBTRACT, since you can't subtract strings.  That means INPLACE_ADD contains more native code.  Depending (heavily!) on how the code is being generated by the compiler, this extra code may be inline with the rest of the INPLACE_ADD code, which means additions can hit the instruction cache harder than subtraction.  This could be causing extra L2 cache hits, which could cause a significant performance difference.

This is heavily dependent on the system you're on (different processors have different amounts of cache and cache architectures), the compiler in use, including the particular version and compilation options (different compilers will decide differently which bits of code are on the critical path, which determines how assembly code is lumped together), and so on.

Also, the difference is reversed in Python 3.0.1 (+: 15.66, -: 16.71); no doubt this critical function has changed a lot.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它10个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复