Is a C++ optimizer allowed to move statements across a function call?

≡放荡痞女 提交于 2019-12-04 23:35:39

Now, I was always under the impression that when an optimizer inlines some function (which may very well be the case in a (simple) benchmark scenario), it is free to rearrange whatever it likes as long as it does not affect observable behavior.

It absolutely is. The optimizer doesn't even need to inline it for this to occur in theory.

However, timing functions are observable behaviour- specifically, they are I/O on the part of the system. The optimizer cannot know that that I/O will produce the same outcome (it obviously won't) if performed in a different order to other I/O calls, which can include non-obvious things like even memory allocation calls that can invoke syscalls to get their memory.

What this basically means is that by and large, for most function calls, the optimizer can't do a great deal of re-arranging because there's potentially a vast quantity of state involved that it can't reason about.

Furthermore, the optimizer can't really know that re-arranging your function calls will actually make the code run faster, and it will make debugging it harder, so they don't have a great deal of incentive to go screwing around with the program's stated order.

Basically, in theory the optimizer can do this, but in reality it won't because doing so would be a massive undertaking for not a lot of benefit.

You'll only encounter conditions like this if your benchmark is fairly trivial or consists virtually entirely of primitive operations like integer addition- in which case you'll want to check the assembly anyway.

Your concern is perfectly valid, the optimizer is allowed to move anything past a function call if it can prove that this does not change observable behavior (other than runtime, that is).

The point about using a function to stop the optimizer from doing things is not to tell the optimizer about the function. That is, the function must not be inlined, and it must not be included in the same compilation unit. Since optimizers are generally a compiler feature, moving the function definition to a different compilation unit deprives the optimizer of the information necessary to prove anything about the function, and consequently stops it from moving anything across the function call.

Beware that this assumes that there is no linker doing global analysis for optimization. If it does, it can still skrew you.

What the comment you quoted has not considered is that sequence points are not primarily about order of execution (although they do constrain it, they don't act as full barriers), but rather about values of expressions.

C++11 actually gets rid of the "sequence point" terminology completely, and instead discussed ordering of "value computation" and "side effects".

To illustrate, the following code exhibits undefined behavior because it doesn't respect ordering:

int a = 5;
int x = a++ + a;

This version is well-defined:

int a = 5;
a++;
int x = a + a;

When the sequence point / ordering of side effects and value computations guarantees us, is that the a used in x = a + a is 6, not 5. So the compiler cannot rewrite it to:

int a = 5;
int x = a + a;
a++;

However, it's perfectly legal to rewrite it as:

int a = 5;
int x = (a+1) + (a+1);
a++;

The order of execution between assigning x and assigning a isn't constrained, because neither of them is volatile or atomic<T> and they aren't externally visible side effects.

The standard leaves definitively free room for the optimizer to sequence operations across the boundary of a function:

1.9/15 Every evaluation in the calling function (including other function calls) that is not otherwise specifically sequenced before or after the execution of the body of the called function is indeterminately sequenced with respect to the execution of the called function.

as long as the as-if rule is respectd:

1.9/5 A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input.

The practice of leaving the optimizer in the blind as suggested by cmaster is in gegeral very efective. By the way, the global optimization issue at linking can also be circumvented using dynamic linking of the benchmarked function.

There's however another a hard sequencing constraint that can be used to achieve the same purpose, even within the same compilation unit:

1.9/15 When calling a function (whether or not the function is inline), every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of every expression or statement in the body of the called function.

So you may use safely an expression like:

 my_timer_off(stop, f( my_timer_on(start) ) );  

This "functional" writing ensures that:

  • my_timer_on() is evaluated before any statement of f() is executed,
  • f() is called before the body of my_timer_off() is executed
  • thus ensuring the sequence timer-on / f / timer-off (the my_timer_xx would take the start/stop by value).

Of course, this assumes that the signature of the benchmarked function f() can be changed to allow the expression above.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!