While I experimented with measuring time of execution of arithmetic operations, I came across very strange behavior. A code block containing a for loop with one
Processors are so complex these days that we can only guess.
The assembly emitted by your compiler is not what is really executed. The microcode/firmware/whatever of your CPU will interpret it and turn it into instructions for its execution engine, much like JIT languages such as C# or java do.
One thing to consider here is that for each loop, there is not 1 or 2 instructions, but n + 2, as you also increment and compare i to your number of iteration. In the vast majority of case it wouldn't matter, but here it does, as the loop body is so simple.
Let's see the assembly :
Some defines:
#define NUM_ITERATIONS 1000000000ll
#define X_INC 17
#define Y_INC -31
C/C++ :
for (long i = 0; i < NUM_ITERATIONS; i++) { x+=X_INC; }
ASM :
mov QWORD PTR [rbp-32], 0
.L13:
cmp QWORD PTR [rbp-32], 999999999
jg .L12
add QWORD PTR [rbp-24], 17
add QWORD PTR [rbp-32], 1
jmp .L13
.L12:
C/C++ :
for (long i = 0; i < NUM_ITERATIONS; i++) {x+=X_INC; y+=Y_INC;}
ASM:
mov QWORD PTR [rbp-80], 0
.L21:
cmp QWORD PTR [rbp-80], 999999999
jg .L20
add QWORD PTR [rbp-64], 17
sub QWORD PTR [rbp-72], 31
add QWORD PTR [rbp-80], 1
jmp .L21
.L20:
So both Assemblies look pretty similar. But then let's think twice : modern CPUs have ALUs which operate on values which are wider than their register size. So there is a chance than in the first case, the operation on x and i are done on the same computing unit. But then you have to read i again, as you put a condition on the result of this operation. And reading means waiting.
So, in the first case, to iterate on x, the CPU might have to be in sync with the iteration on i.
In the second case, maybe x and y are treated on a different unit than the one dealing with i. So in fact, your loop body runs in parallel than the condition driving it. And there goes your CPU computing and computing until someone tells it to stop. It doesn't matter if it goes too far, going back a few loops is still fine compared to the amount of time it just gained.
So, to compare what we want to compare (one operation vs two operations), we should try to get i out of the way.
One solution is to completely get rid of it by using a while loop: C/C++:
while (x < (X_INC * NUM_ITERATIONS)) { x+=X_INC; }
ASM:
.L15:
movabs rax, 16999999999
cmp QWORD PTR [rbp-40], rax
jg .L14
add QWORD PTR [rbp-40], 17
jmp .L15
.L14:
An other one is to use the antequated "register" C keyword: C/C++:
register long i;
for (i = 0; i < NUM_ITERATIONS; i++) { x+=X_INC; }
ASM:
mov ebx, 0
.L17:
cmp rbx, 999999999
jg .L16
add QWORD PTR [rbp-48], 17
add rbx, 1
jmp .L17
.L16:
Here are my results:
x1 for: 10.2985 seconds. x,y = 17000000000,0
x1 while: 8.00049 seconds. x,y = 17000000000,0
x1 register-for: 7.31426 seconds. x,y = 17000000000,0
x2 for: 9.30073 seconds. x,y = 17000000000,-31000000000
x2 while: 8.88801 seconds. x,y = 17000000000,-31000000000
x2 register-for :8.70302 seconds. x,y = 17000000000,-31000000000
Code is here: https://onlinegdb.com/S1lAANEhI