This question is about how we multiply an integer with a constant. So let\'s look at a simple function:
int f(int x) {
return 10*x;
}
How c
I'd guess that the shift-and-add sequence was faster than imul; this has been true for many versions of x86 chips. I don't know if it is true of Haswell; still, doing a imul in 2 clock cycles takes significant chip resources if it is doable at all.
I'm a bit surprised it didn't produce an even faster sequence:
lea y, [2*y]
lea y, [5*y]
[OP edits his answer, shows optimized code producing ADD then LEA. Yes, that's a better answer; the ADD r,r is smaller spacewise than lea ..[2*y] so the resulting code is smaller and the same speed]