问题
I'm teaching myself some assembly programming with x86-64 Mac OS. I'm trying to figure out why when it comes to dividing a positive integer with a negative integer gives me an overflow. For example, 5/-2 must return -2. However, in my case, it returns a 2147483371 when I do -554/2 instead of -277... This is what I have in my assembly file:
; compiling using: nasm -f macho64 -o divide.o divide.s
[bits 64]
global _divide
section .text
; int divide(int dividend, int divisor)
_divide:
xor rdx, rdx ; making this to 0
push rbp ; base stack pointer
mov rax, rdi ; dividend
mov rcx, rsi ; divisor
idiv rcx ; integer division
add rsp, 8
ret
In my main.c file, I have this:
#include <stdio.h>
extern int divide(int dividend, int divisor);
int main(void)
{
printf("divide: %d\n\n", divide(-554,2));
return (0);
}
OUTPUT: divide: 2147483371
Can someone explain to me what exactly I'm doing wrong?
回答1:
The 32-bit value -554signed is equivalent to 4,294,966,742unsigned and half of that is indeed 2,147,483,371, the answer you're getting. So it looks like a signed/unsigned issue. And, upon examining the x86 docs for idiv, we see:
IDIV r/m64 Signed divide RDX:RAX by r/m64, result stored in:
RAX <- Quotient,
RDX <- Remainder.
Note that first line, specifically the "signed divide rdx:rax by" bit. When Intel talk about rdx:rax, they mean the 128-bit value formed from those two 64-bit registers. Assuming that those two 64-bit registers contain the (hex) values:
rax : 01234567 89ABCDEF
rdx : 11112222 FFFFEEEE
then the rdx:rax value will be the 128-bit value:
rdx:rax : 11112222 FFFFEEEE 01234567 89ABCDEF
Now, because you are zeroing out rdx, the combined value is considered positive because the top bit is zero. What you actually need to do is sign-extend rax into rdx:rax, a method that preserves the sign in the extended value. For example, consider the 32-bit -1, sign-extended properly and improperly into a 64-bit value:
ffffffff 32-bit: -1.
ffffffff ffffffff 64-bit proper: -1.
00000000 ffffffff 64-bit improper: 4,294,967,295.
To sign-extend properly, the leftmost bits (rdx in your case) should be all one-bits if the rightmost bits (rax for you) form a negative number, all zero-bits otherwise.
Of course, those clever Intel engineers have already thought of just this use case, so you can do it with the cqo convert-quadword-to-octoword instruction, which sign extends correctly. With that in mind, your code for setting eax would become:
mov rax, rdi ; Get dividend and
cqo ; sign extend to rdx:rax.
However, you may have an extra problem as well. Even though the System V x86-64 ABI specifies that the parameters are passed in the 64-bit registers (rXX), it's possible that passing 32-bit values will actually leave the upper bits containing rubbish (and I think you're allowed to leave rubbish in the upper parts of the return value as well. See this excellent answer for details.
So you should not assume that you will have a sane value in the entire 64-bit register, only in the rightmost 32 bits.
In your case (assuming 32-bit integers), you should be sign extending 32-to-64 rather than 64-to-128, and using a smaller-width division instruction. That would result in something more like:
global _divide
section .text
; int32_t divide(int32_t ediDividend, int32_t esiDivisor)
_divide:
mov eax, edi ; Get 32-bit dividend and
cdq ; sign extend to 64-bit edx:eax.
idiv esi ; Weave magic here,
; zeros leftmost rax.
ret ; Return quotient in rax/eax.
This is untested, but should do what you want. I've actually removed the pushing of rbp since I'm pretty certain it's not necessary. It appears not to be corrupted (this function neither changes it, nor calls any other function which could change it), and it appears you never actually restored it properly in your original code anyway.
回答2:
Your code is also broken for negative divisors: divide(5,-2) will give zero. This is purely explained by calling-convention. Your zero-extension instead of sign-extension bug (see @paxdiablo's answer) only matters for negative dividends.
You told the compiler your function takes int args, and int is a 32-bit type in the x86-64 System V calling convention.
You're assuming your inputs are sign-extended to 64-bit, but the calling convention doesn't require that, so the compiler won't waste code-size on 10-byte mov r64, imm64 when it can use 5-byte mov r32, imm32 instead.
For more details, see these Q&As. (the 2nd is basically a duplicate of the first):
- Is a sign or zero extension required when adding a 32bit offset to a pointer for the x86-64 ABI?
- Is garbage allowed in high bits of parameter and return value registers in x86-64 SysV ABI?
Thus your compiler will emit code like this for your main:
mov edi, 5 ; RDI = 0x0000000000000002
mov esi, -2 ; RSI = 0x00000000FFFFFFFE
call _divide
I checked on the Godbolt compiler explorer, and that's what gcc and clang really do1, even for un-optimized code.
For divide(5,-2), your code will result in
- RDX=0, RAX=5. i.e. dividend = 0x0000000000000000:0000000000000005, which is correct. (zero- and sign-extension are the same operation for non-negative inputs).
- divisor = 0x00000000FFFFFFFE = +4294967294, which is large and positive.
64-bit idiv calculates 5 / 4294967294 producing quotient=RAX=0, remainder=RDX=5.
If you only fixed the type-width / operand-size mismatch bug, you'd still have problems with negative dividends like @paxdiablo's answer explains. But both fixes are necessary for divide(-554,2) to actually work.
So how should you have written it?
You could change the prototype to int64_t or long (which is 64-bit in x86-64 System V), and use cqo to set up for signed division. (When and why do we sign extend and use cdq with mul/div?)
Or you could sign-extend your 32-bit inputs to 64-bit, with movsxd rax, edi / movsxd rcx, esi. But that would be silly. Just use 32-bit operand-size since that's what you told the compiler to pass.
That's good because 64-bit division is much slower than 32-bit division. (https://agner.org/optimize/, and C++ code for testing the Collatz conjecture faster than hand-written assembly - why?).
This is what I'd do:
global _divide
; inputs: int32_t dividend in EDI, int32_t divisor in ESI
; output: int32_t quotient in EAX, int32_t remainder in EDX
; (C callers won't be able to access the remainder, unfortunately)
_divide:
mov eax, edi
cdq ; sign-extend the dividend into edx:eax
idiv esi ; no need to copy to ecx/rcx first
ret
No need to push RBP; we're not calling any other functions so realigning the stack doesn't matter, and we're not modifying RBP for use as a frame pointer.
We're allowed to clobber RDX without saving/restoring it: it's a call-clobbered register in x86-64 System V, and Windows x64. (Same as in most 32-bit calling conventions). This makes sense because it's used implicitly by some common instructions like idiv.
That's what gcc and clang do emit (with optimization enabled of course) if you write it in C.
int divide(int dividend, int divisor) {
return dividend / divisor;
}
(See the Godbolt link above, where I included it with __attribute__((noinline)) so I could still see main actually setting up function args. I could have just named it something else instead.)
As usual, looking at compiler output to see the difference between your code and what the compiler did can clue you in to something you did wrong. (Or give you a better starting point for optimization. In this case the compilers don't have any missed optimizations, though.) See How to remove "noise" from GCC/clang assembly output?.
You can change the types to long (which is 64-bit in x86-64 System V, unlike in Windows x64) if you want to see code-gen for 64-bit integers. And also see how the caller changes, e.g.
mov edi, 5
mov rsi, -2
call _divide
Footnote 1: Interestingly clang -O3's asm output has mov esi, -2, but clang -O0 writes it as mov edi, 4294967294.
Those both assemble to the same instruction, of course, zeroing the upper 32 bits of RDI, because that's how AMD designed AMD64, rather than for example implicitly sign-extending into the full register, which would have been a valid design choice but probably not quite as cheap as zero-extending.
And BTW, Godbolt has compilers targeting Linux, but that's the same calling convention. The only difference is that OS X decorates function names with a leading _ but Linux doesn't.
来源:https://stackoverflow.com/questions/51717317/dividing-with-a-negative-number-gives-me-an-overflow-in-nasm