In C, why is signed int faster than unsigned int? True, I know that this has been asked and answered multiple times on this website (links below).
From Instruction specification on AMD/Intel we have (for K7):
Instruction Ops Latency Throughput
DIV r32/m32 32 24 23
IDIV r32 81 41 41
IDIV m32 89 41 41
For i7, latency and throughput are the same for IDIVL and DIVL, a slight difference exists for the µops.
This may explain the difference as -O3 assembly codes only differ by signedness (DIVL vs IDIVL) on my machine.