It's a variation of code from this tweet, just shorter one and not causing any damage to noobs. We have this code:
typedef int (*Function)();
static Function DoSmth;
static int Return7()
{
return 7;
}
void NeverCalled()
{
DoSmth = Return7;
}
int main()
{
return DoSmth();
}
You see that NeverCalled()
is never called in the code, don't you? Here's what Compiler Explorer shows when clang 3.8 is selected with
-Os -std=c++11 -Wall
Code emitted is:
NeverCalled():
retq
main:
movl $7, %eax
retq
as if NeverCalled()
was actually called before DoSmth()
and set the DoSmth
function pointer to Return7()
function.
If function pointer assignment is removed from inside NeverCalled()
as in here:
void NeverCalled() {}
then code being emitted is this:
NeverCalled():
retq
main:
ud2
The latter is quite expected. The compiler knows that function pointer is surely null and calling function using a null function pointer is undefined behavior.
The former code is not really expected. Somehow the compiler decided to have Return7()
called although it's not directly called anywhere and function pointer assignment is inside function that is not called.
Yes, I know the compiler facing code with undefined behavior is allowed to do this by C++ Standard. Just how does it do this?
How does clang happen to emit this specific machine code?
NeverCalled
is a misnomer. Any global function is potentially called (by a constructor of a global object in a different translation unit, for example).
Incidentally, this is the only way this TU can possibly be incorporated in a program that doesn't have UB. In this case, main
returns 7.
Make NeverCalled
static, and main
will compile to empty code.
The path by which clang does this is probably something along the lines of;
DoSmth
is astatic
, so is zero initialised. Since it is a pointer (to function) that has the effect of initialisation to theNULL
pointer (ornullptr
)main()
doesreturn DoSmth()
so clang then reasons thatDoSmth
cannot beNULL
, since that would causereturn DoSmth()
to exhibit undefined behaviour;- It then reasons about other code in the compilation unit, and finds that there is an assignment
DoSmth = Return7
inNeverCalled()
; - Since that is the only statement in the compilation unit which sets
DoSmth
to be non-NULL, and it has reasoned thatDoSmth
is not NULL, clang assumesNeverCalled()
must have been called somehow; - As a result of the above reasoning clang concludes that
DoSmth
must be equal to the address ofReturn7
; - Since it has now reasoned that
DoSmth == Return7
, clang converts thereturn DoSmth()
intoreturn Return7()
; Return7()
is in the same compilation unit, so clang inlines it.
The specifics of how clang does this internally is anyone's guess. However, various steps of code optimisation probably result in a reasoning chain something like the above.
The point is that your code - as it stands - has undefined behaviour. One cute feature of undefined behaviour is that a compiler is permitted (as distinct from required) to reason that your code actually has well-defined behaviour. In turn, that permits the compiler to reason that some code which ensures the behaviour to be well-defined has been magically executed.
来源:https://stackoverflow.com/questions/46272628/how-does-clang-manage-to-compile-this-code-with-undefined-behavior-into-this-mac