问题
In the following code, I memset() a stdbool.h bool variable to value 123. (Perhaps this is undefined behaviour?) Then I pass a pointer to this variable to a victim function, which tries to protect against unexpected values using a conditional operation. However, GCC for some reason seems to remove the conditional operation altogether.
#include <stdio.h>
#include <stdbool.h>
#include <string.h>
void victim(bool* foo)
{
    int bar = *foo ? 1 : 0;
    printf("%d\n", bar);
}
int main()
{
    bool x;
    bool *foo = &x;
    memset(foo, 123, sizeof(bool));
    victim(foo);
    return 0;
}
user@host:~$ gcc -Wall -O0 test.c user@host:~$ ./a.out 123
What makes this particularly annoying is that the victim() function is actually inside a library, and will crash if the value is more than 1.
Reproduced on GCC versions 4.8.2-19ubuntu1 and 4.7.2-5. Not reproduced on clang.
回答1:
(Perhaps this is undefined behaviour?)
Not directly, but reading from the object afterwards is.
Quoting C99:
6.2.6 Representations of types
6.2.6.1 General
5 Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. [...]
Basically, what this means is that if a particular implementation has decided that the only two valid bytes for a bool are 0 and 1, then you'd better make sure you don't use any trickery to attempt to set it to any other value.
回答2:
When GCC compiles this program, the assembly language output includes the sequence
movzbl (%rax), %eax
movzbl %al, %eax
movl %eax, -4(%rbp)
which does the following:
- Copy 32 bits from *foo(denoted by(%rax)in assembly) to the register%eaxand fill in the higher-order bits of%eaxwith zeros (not that there are any, because%eaxis a 32-bit register).
- Copy the low-order 8 bits of %eax(denoted by%al) to%eaxand fill in the higher-order bits of%eaxwith zeros. As a C programmer you would understand this as%eax &= 0xff.
- Copy the value of %eaxto 4 bytes above%rbp, which is the location ofbaron the stack.
So this code is an assembly-language translation of
int bar = *foo & 0xff;
Clearly GCC has optimized the line based on the fact that a bool should never hold any value other than 0 or 1.
If you change the relevant line in the C source to this
int bar = *((int*)foo) ? 1 : 0;
then the assembly changes to
movl (%rax), %eax
testl %eax, %eax
setne %al
movzbl %al, %eax
movl %eax, -4(%rbp)
which does the following:
- Copy 32 bits from *foo(denoted by(%rax)in assembly) to the register%eax.
- Test 32 bits of %eaxagainst itself, which means ANDing it with itself and setting some flags in the processor based on the result. (The ANDing is unnecessary here, but there's no instruction to simply check a register and set flags.)
- Set the low-order 8 bits of %eax(denoted by%al) to 1 if the result of the ANDing was 0, or to 0 otherwise.
- Copy the low-order 8 bits of %eax(denoted by%al) to%eaxand fill in the higher-order bits of%eaxwith zeros, as in the first snippet.
- Copy the value of %eaxto 4 bytes above%rbp, which is the location ofbaron the stack; also as in the first snippet.
This is actually a faithful translation of the C code. And indeed, if you add the cast to (int*) and compile and run the program, you'll see that it does output 1.
回答3:
Storing a value different than 0 or 1 in a bool is undefined behavior in C.
So actually this:
int bar = *foo ? 1 : 0;
is optimized with something close to this:
int bar = *foo ? *foo : 0;
来源:https://stackoverflow.com/questions/27661768/weird-results-for-conditional-operator-with-gcc-and-bool-pointers