I am doing some numerical optimization on a scientific application. One thing I noticed is that GCC will optimize the call pow(a,2) by compiling it into a
gcc actually can do this optimization, even for floating-point numbers. For example,
double foo(double a) {
return a*a*a*a*a*a;
}
becomes
foo(double):
mulsd %xmm0, %xmm0
movapd %xmm0, %xmm1
mulsd %xmm0, %xmm1
mulsd %xmm1, %xmm0
ret
with -O -funsafe-math-optimizations. This reordering violates IEEE-754, though, so it requires the flag.
Signed integers, as Peter Cordes pointed out in a comment, can do this optimization without -funsafe-math-optimizations since it holds exactly when there is no overflow and if there is overflow you get undefined behavior. So you get
foo(long):
movq %rdi, %rax
imulq %rdi, %rax
imulq %rdi, %rax
imulq %rax, %rax
ret
with just -O. For unsigned integers, it's even easier since they work mod powers of 2 and so can be reordered freely even in the face of overflow.