Why is subtraction faster than addition in Python?

后端未结

关注

 10  1374

I was optimising some Python code, and tried the following experiment:

import time

start = time.clock()
x = 0
for i in range(10000000):
    x += 1
end = tim


                      
              相关标签:


      
      
        
          10条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  轮回少年        
                
              
                            
                2020-12-13 13:31
              
            
            
                                                                       
That would be remarkable, so I have thoroughly evaluated your code and also setup the expiriment as I would find it more correct (all declarations and function calls outside the loop). Both versions I have run five times.


Running your code validated your claims:
-= takes constantly less time; 3.6% on average
Running my code, though, contradicts the outcome of your experiment:
+= takes on average (not always) 0.5% less time.


To show all results I have put plots online:


Your evaluation: http://bayimg.com/kadAeaAcN
My evaluation: http://bayimg.com/KadaAaAcN


So, I conclude that your experiment has a bias, and it is significant.

Finally here is my code:

import time

addtimes = [0.] * 100
subtracttimes = [0.] * 100

range100 = range(100)
range10000000 = range(10000000)

j = 0
i = 0
x = 0
start = 0.


for j in range100:
 start = time.clock()
 x = 0
 for i in range10000000:
  x += 1
 addtimes[j] = time.clock() - start

for j in range100:
 start = time.clock()
 x = 0
 for i in range10000000:
  x -= -1
 subtracttimes[j] = time.clock() - start

print '+=', sum(addtimes)
print '-=', sum(subtracttimes)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  萌比男神i        
                
              
                            
                2020-12-13 13:35
              
            
            
                                                                       
I can reproduce this on my Q6600 (Python 2.6.2); increasing the range to 100000000:

('+=', 11.370000000000001)
('-=', 10.769999999999998)


First, some observations:


This is 5% for a trivial operation.  That's significant.
The speed of the native addition and subtraction opcodes is irrelevant.  It's in the noise floor, completely dwarfed by the bytecode evaluation.  That's talking about one or two native instructions around thousands.
The bytecode generates exactly the same number of instructions; the only difference is INPLACE_ADD vs. INPLACE_SUBTRACT and +1 vs -1.


Looking at the Python source, I can make a guess.  This is handled in ceval.c, in PyEval_EvalFrameEx.  INPLACE_ADD has a significant extra block of code, to handle string concatenation.  That block doesn't exist in INPLACE_SUBTRACT, since you can't subtract strings.  That means INPLACE_ADD contains more native code.  Depending (heavily!) on how the code is being generated by the compiler, this extra code may be inline with the rest of the INPLACE_ADD code, which means additions can hit the instruction cache harder than subtraction.  This could be causing extra L2 cache hits, which could cause a significant performance difference.

This is heavily dependent on the system you're on (different processors have different amounts of cache and cache architectures), the compiler in use, including the particular version and compilation options (different compilers will decide differently which bits of code are on the critical path, which determines how assembly code is lumped together), and so on.

Also, the difference is reversed in Python 3.0.1 (+: 15.66, -: 16.71); no doubt this critical function has changed a lot.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  走了就别回头了        
                
              
                            
                2020-12-13 13:35
              
            
            
                                                                       
It's always a good idea when asking a question to say what platform and what version of Python you are using. Sometimes it does't matter. This is NOT one of those times:


time.clock() is appropriate only on Windows. Throw away your own measuring code and use -m timeit as demonstrated in pixelbeat's answer.
Python 2.X's range() builds a list. If you are using Python 2.x, replace range with xrange and see what happens.
Python 3.X's int is Python2.X's long.

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  别那么骄傲        
                
              
                            
                2020-12-13 13:37
              
            
            
                                                                       
I think the "general programming lesson" is that it is really hard to predict, solely by looking at the source code, which sequence of statements will be the fastest. Programmers at all levels frequently get caught up by this sort of "intuitive" optimisation. What you think you know may not necessarily be true.

There is simply no substitute for actually measuring your program performance. Kudos for doing so; answering why undoubtedly requires delving deep into the implementation of Python, in this case.

With byte-compiled languages such as Java, Python, and .NET, it is not even sufficient to measure performance on just one machine. Differences between VM versions, native code translation implementations, CPU-specific optimisations, and so on will make this sort of question ever more tricky to answer.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  天涯浪人        
                
              
                            
                2020-12-13 13:40
              
            
            
                                                                       
Is there a more general programming lesson here?

The more general programming lesson here is that intuition is a poor guide when predicting run-time performance of computer code.

One can reason about algorithmic complexity, hypothesise about compiler optimisations, estimate cache performance and so on. However, since these things can interact in non-trivial ways, the only way to be sure about how fast a particular piece of code is going to be is to benchmark it in the target environment (as you have rightfully done.)
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  旧巷少年郎        
                
              
                            
                2020-12-13 13:43
              
            
            
                                                                       

  "The second loop is reliably faster ..."


That's your explanation right there. Re-order your script so the subtraction test is timed first, then the addition, and suddenly addition becomes the faster operation again:

-= 3.05
+= 2.84


Obviously something happens to the second half of the script that makes it faster. My guess is that the first call to range() is slower because python needs to allocate enough memory for such a long list, but it is able to re-use that memory for the second call to range():

import time
start = time.clock()
x = range(10000000)
end = time.clock()
del x
print 'first range()',end-start
start = time.clock()
x = range(10000000)
end = time.clock()
print 'second range()',end-start


A few runs of this script show that the extra time needed for the first range() accounts for nearly all of the time difference between '+=' and '-=' seen above:

first range() 0.4
second range() 0.23

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复