The following code computes the product of x and y and stores the result in memory. Data type ll_t is defined to be equivalent to long long.
typedef long lon
I've converted it to intel syntax.
mov esi, y_low
mov eax, x
mov edx, eax
sar edx, 31
mov ecx, y_high
imul ecx, eax ; ecx = y_high *{signed} x
mov ebx, edx
imul ebx, esi ; ebx = sign_extension(x) *{signed} y_low
add ecx, ebx ; ecx = y_high *{signed} x_low + x_high *{signed} y_low
mul esi ; edx:eax = x_low *{unsigned} y_low
lea edx, [ecx + edx] ; edx = high(x_low *{unsigned} y_low + y_high *{signed} x_low + x_high *{signed} y_low)
mov ecx, dest
mov [ecx], eax
mov [ecx + 4], edx
What the above code does is multiplication of 2 64-bit signed integers that keeps the least-significant 64 bits of the product.
Where does the other 64-bit multiplicand come from? It's x
sign-extended from 32 bits to 64. The sar
instruction is used to replicate x's
sign bit into all bits of edx
. I call this value consisting only of the x's
sign x_high
. x_low
is the value of x
actually passed into the routine.
y_low
and y_high
are the least and most significant parts of y
, just like x's
x_low
and x_high
are.
From here it's pretty easy:
product = y
*{signed} x
=
(y_high
* 232 + y_low
) *{signed} (x_high
* 232 + x_low
) =
y_high
*{signed} x_high
* 264 +
y_high
*{signed} x_low
* 232 +
y_low
*{signed} x_high
* 232 +
y_low
*{signed} x_low
y_high
*{signed} x_high
* 264 isn't calculated because it doesn't contribute to the least significant 64 bits of the product. We'd calculate it if we were interested in the full 128-bit product (full 96-bit product for the picky).
y_low
*{signed} x_low
is calculated using unsigned multiplication. It's legal to do so because 2's complement signed multiplication gives the same least significant bits as unsigned multiplication. Example:
-1 *{signed} -1 = 1
0xFFFFFFFFFFFFFFFF *{unsigned} 0xFFFFFFFFFFFFFFFF = 0xFFFFFFFFFFFFFFFE0000000000000001 (64 least significant bits are equivalent to 1)
Consider the context of line 8 and 9.
By this time, ESI
contains the lower half of y
and EBX
contains sgn(x)
. So line 8 is just computing sgn(x) * (y % 2^32)
and storing it in EBX
.
Line 9 draws upon that result. By the time Line 9 happens, ECX
contains a partial upper half of the multiplication, that is, x * (y >> 32)
signed. So EBX+ECX
ends up being what we computed in the last step plus the partial upper half we found on a previous line.
The full algorithm itself is pretty neat ;)
EDIT: In response to a comment below...
Line 4: Consider what SAR EDX, 31
(or if you like, sar $31, %edx
) really means. Since EDX
is a 32-bit register, you'll end up with one of two values. Which two? Consider what they mean in the context of signed arithmetic.
Line 7: EDX
by this point contains something pretty useful for the following operations. I'm just moving it where it needs to go.
What imul does is multiplies the contents of eax with ecx and saves the lower 32 bits in eax and higher 32 bits in edx.
addl as far as I remember adds the two registers and saves it on the first one so in this case ebx. (I am not sure if it does anything else and the l after addl stands for long)