How does clang manage to compile this code with undefined behavior into this machine code?

元气小坏坏 提交于 2019-12-10 11:32:01

问题


It's a variation of code from this tweet, just shorter one and not causing any damage to noobs. We have this code:

typedef int (*Function)();

static Function DoSmth;

static int Return7()
{
    return 7;
}

void NeverCalled()
{
   DoSmth = Return7;  
}

int main()
{
    return DoSmth();
}

You see that NeverCalled() is never called in the code, don't you? Here's what Compiler Explorer shows when clang 3.8 is selected with

-Os -std=c++11 -Wall

Code emitted is:

NeverCalled():
    retq
main:
    movl    $7, %eax
    retq

as if NeverCalled() was actually called before DoSmth() and set the DoSmth function pointer to Return7() function.

If function pointer assignment is removed from inside NeverCalled() as in here:

void NeverCalled() {}

then code being emitted is this:

NeverCalled():
    retq
main:
    ud2

The latter is quite expected. The compiler knows that function pointer is surely null and calling function using a null function pointer is undefined behavior.

The former code is not really expected. Somehow the compiler decided to have Return7() called although it's not directly called anywhere and function pointer assignment is inside function that is not called.

Yes, I know the compiler facing code with undefined behavior is allowed to do this by C++ Standard. Just how does it do this?

How does clang happen to emit this specific machine code?


回答1:


NeverCalled is a misnomer. Any global function is potentially called (by a constructor of a global object in a different translation unit, for example).

Incidentally, this is the only way this TU can possibly be incorporated in a program that doesn't have UB. In this case, main returns 7.

Make NeverCalled static, and main will compile to empty code.




回答2:


The path by which clang does this is probably something along the lines of;

  • DoSmth is a static, so is zero initialised. Since it is a pointer (to function) that has the effect of initialisation to the NULL pointer (or nullptr)
  • main() does return DoSmth() so clang then reasons that DoSmth cannot be NULL, since that would cause return DoSmth() to exhibit undefined behaviour;
  • It then reasons about other code in the compilation unit, and finds that there is an assignment DoSmth = Return7 in NeverCalled();
  • Since that is the only statement in the compilation unit which sets DoSmth to be non-NULL, and it has reasoned that DoSmth is not NULL, clang assumes NeverCalled() must have been called somehow;
  • As a result of the above reasoning clang concludes that DoSmth must be equal to the address of Return7;
  • Since it has now reasoned that DoSmth == Return7, clang converts the return DoSmth() into return Return7();
  • Return7() is in the same compilation unit, so clang inlines it.

The specifics of how clang does this internally is anyone's guess. However, various steps of code optimisation probably result in a reasoning chain something like the above.

The point is that your code - as it stands - has undefined behaviour. One cute feature of undefined behaviour is that a compiler is permitted (as distinct from required) to reason that your code actually has well-defined behaviour. In turn, that permits the compiler to reason that some code which ensures the behaviour to be well-defined has been magically executed.



来源:https://stackoverflow.com/questions/46272628/how-does-clang-manage-to-compile-this-code-with-undefined-behavior-into-this-mac

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!