Is an extra move somehow faster when doing division-by-multiplication?
问题 Consider this function: unsigned long f(unsigned long x) { return x / 7; } With -O3 , Clang turns the division into a multiplication, as expected: f: # @f movabs rcx, 2635249153387078803 mov rax, rdi mul rcx sub rdi, rdx shr rdi lea rax, [rdi + rdx] shr rax, 2 ret GCC does basically the same thing, except for using rdx where Clang uses rcx . But they both appear to be doing an extra move. Why not this instead? f: movabs rax, 2635249153387078803 mul rdi sub rdi, rdx shr rdi lea rax, [rdi + rdx