问题
Consider the following program.
#include <stdio.h>
int negative(int A) {
return (A & 0x80000000) != 0;
}
int divide(int A, int B) {
printf("A = %d\n", A);
printf("negative(A) = %d\n", negative(A));
if (negative(A)) {
A = ~A + 1;
printf("A = %d\n", A);
printf("negative(A) = %d\n", negative(A));
}
if (A < B) return 0;
return 1;
}
int main(){
divide(-2147483648, -1);
}
When it is compiled without compiler optimizations, it produces expected results.
gcc -Wall -Werror -g -o TestNegative TestNegative.c
./TestNegative
A = -2147483648
negative(A) = 1
A = -2147483648
negative(A) = 1
When it is compiled with compiler optimizations, it produces the following incorrect output.
gcc -O3 -Wall -Werror -g -o TestNegative TestNegative.c
./TestNegative
A = -2147483648
negative(A) = 1
A = -2147483648
negative(A) = 0
I am running gcc version 5.4.0.
Is there a change I can make in the source code to prevent the compiler from producing this behavior under -O3?
回答1:
-2147483648does not do what you think it does. C doesn't have negative constants. Includelimits.hand useINT_MINinstead (pretty much everyINT_MINdefinition on two's complement machines defines it as(-INT_MAX - 1)for a good reason).A = ~A + 1;invokes undefined behavior because~A + 1causes integer overflow.
It's not the compiler, it's your code.
回答2:
The compiler replaces your A = ~A + 1; statement with a single neg instruction, i.e. this code:
int just_negate(int A) {
A = ~A + 1;
return A;
}
will be compiled to:
just_negate(int):
mov eax, edi
neg eax // just negate the input parameter
ret
But the compiler is also smart enough to realize that, if A & 0x80000000 was non-zero before negation, it must be zero after negation, unless you are relying on undefined behavior.
This means that the second printf("negative(A) = %d\n", negative(A)); can be "safely" optimized to:
mov edi, OFFSET FLAT:.LC0 // .string "negative(A) = %d\n"
xor eax, eax // just set eax to zero
call printf
I use the online godbolt compiler explorer to check the assembly for various compiler optimizations.
回答3:
To explain in detail what's going on here:
In this answer I'm assuming that
longis 32 bits andlong longis 64 bits. This is the most common case, but not guaranteed.C does not have signed integer contants.
-2147483648is actually of typelong long, on which you apply the unary minus operator.The compiler picks the type of the integer constant after checking if
2147483648can fit:- Inside an
int? No it cannot. - Inside a
long? No it cannot. - Inside a
long long? Yes it can. So the type of the integer constant will therefore belong long. Then apply unary minus on thatlong long.
- Inside an
- Then you try to show this negative
long longto a function expecting anint. A good compiler might warn here. You force an implicit conversion to a smaller type ("lvalue conversion").
However, assuming 2's complement, the value-2147483648can fit inside anint, so no implementation-defined behavior is needed for the conversion, which would otherwise have been the case. Next tricky part is the function
negativewhere you use0x80000000. This is not aninteither, nor is it along long, but anunsigned int(see this for an explanation).When comparing your passed
intwith anunsigned int, "the usual arithmetic conversions" (see this) force an implicit conversion to theinttounsigned int. It doesn't affect the result in this specific case, but this is whygcc -Wconversionusers do get a nice warning here.(Hint: enable
-Wconversionalready! It is good for catching subtle bugs, but not part of-Wallor-Wextra.)Next you do
~A, a bitwise inverse of the binary representation of the value, ending up with the value0x7FFFFFFF. This is, as it turns out, the same value asINT_MAXon your 32 or 64 bit system. Thus0x7FFFFFFF + 1gives a signed integer overflow which leads to undefined behavior. This is the reason why the program is misbehaving.Cheekily, we could change the code to
A = ~A + 1u;and suddenly everything works as expected, again because of implicit integer promotion.
Lessons learned:
In C, integer constants, as well as implicit integer promotions, are very dangerous and unintuitive. They can subtly change the meaning of the program completely and introduce bugs. At each and every operation in C, you need to consider the actual types of the operands involved.
Playing around with C11 _Generic could be a good way to see the actual types. Example:
#define TYPE_SAFE(val, type) _Generic((val), type: val)
...
(void) TYPE_SAFE(-2147483648, int); // won't compile, type is long or long long
(void) TYPE_SAFE(0x80000000, int); // won't compile, type is unsigned int
Good safety measures to protect yourself from bugs like these is to always use stdint.h and to use MISRA-C.
回答4:
You are relying on undefined behavior. 0x7fffffff + 1 for 32 bit signed integers results in signed integer overflow, which is undefined behavior according to the standard, so anything goes.
In gcc you can force wraparound bahavior by passing -fwrapv; still, if you have no control over the flags - and more in general, if you want a more portable program - you should do all these tricks on unsigned integers, which are required by the standard to wrap around (and have well defined semantics for bitwise operations, unlike signed integers).
First convert the int to unsigned (well defined according to the standard, yields the expected result), do your stuff, convert back to int - implementation-defined (≠ undefined) for values bigger than the range of int, but actually defined by every compiler working in 2's complement to do the "right thing".
int divide(int A, int B) {
printf("A = %d\n", A);
printf("negative(A) = %d\n", negative(A));
if (negative(A)) {
A = ~((unsigned)A) + 1;
printf("A = %d\n", A);
printf("negative(A) = %d\n", negative(A));
}
if (A < B) return 0;
return 1;
}
Your version (at -O3):
A = -2147483648
negative(A) = 1
A = -2147483648
negative(A) = 0
My version (at -O3):
A = -2147483648
negative(A) = 1
A = -2147483648
negative(A) = 1
来源:https://stackoverflow.com/questions/48761870/how-can-i-prevent-the-gcc-optimizer-from-producing-incorrect-bit-operations