Is a C++ optimizer allowed to move statements across a function call?

问题

Note: No multithreading at all here. Just optimized single-threaded code.

A function call introduces a sequence point. (Apparently.)

Does it follow that a compiler (if the optimizer inlines the function) is not allowed to move/intermingle any instructions prior/after with the function's instructions? (As long as it can "proove" no observable effects obviously.)

Explanatory background:

Now, there is a nice article wrt. a benchmarking class for C++, where the author stated:

The code we time won’t be rearranged by the optimizer and will always lie between those start / end calls to now(), so we can guarantee our timing will be valid.

to which I asked how he can be sure, and nick replied:

You can check the comment in this answer https://codereview.stackexchange.com/a/48884. I quote : “I would be careful about timing things that are not functions because of optimizations that the compiler is allowed to do. I am not sure about the sequencing requirements and the observable behavior understanding of such a program. With a function call the compiler is not allowed to move statements across the call point (they are sequenced before or after the call).”

What we do is basically abstract the callable (function, lambda, block of code surrounded by lambda) and have a signle call callable(factor) inside the measure structure that acts as a barrier (not the barrier in multithreading, I believe I convey the message).

I am quite unsure about this, especially the quote:

With a function call the compiler is not allowed to move statements across the call point (they are sequenced before or after the call).

Now, I was always under the impression that when an optimizer inlines some function (which may very well be the case in a (simple) benchmark scenario), it is free to rearrange whatever it likes as long as it does not affect observable behavior.

That is, as far as the language / the optimizer are concerned, these two snippets are exactly the same:

void f() {
  // do stuff / Multiple statements
}

auto start = ...;
f();
auto stop = ...;

vs.

auto start = ...;
  // do stuff / Multiple statements
auto stop = ...;

回答1:

Now, I was always under the impression that when an optimizer inlines some function (which may very well be the case in a (simple) benchmark scenario), it is free to rearrange whatever it likes as long as it does not affect observable behavior.

It absolutely is. The optimizer doesn't even need to inline it for this to occur in theory.

However, timing functions are observable behaviour- specifically, they are I/O on the part of the system. The optimizer cannot know that that I/O will produce the same outcome (it obviously won't) if performed in a different order to other I/O calls, which can include non-obvious things like even memory allocation calls that can invoke syscalls to get their memory.

What this basically means is that by and large, for most function calls, the optimizer can't do a great deal of re-arranging because there's potentially a vast quantity of state involved that it can't reason about.

Furthermore, the optimizer can't really know that re-arranging your function calls will actually make the code run faster, and it will make debugging it harder, so they don't have a great deal of incentive to go screwing around with the program's stated order.

Basically, in theory the optimizer can do this, but in reality it won't because doing so would be a massive undertaking for not a lot of benefit.

You'll only encounter conditions like this if your benchmark is fairly trivial or consists virtually entirely of primitive operations like integer addition- in which case you'll want to check the assembly anyway.

回答2:

Your concern is perfectly valid, the optimizer is allowed to move anything past a function call if it can prove that this does not change observable behavior (other than runtime, that is).

The point about using a function to stop the optimizer from doing things is not to tell the optimizer about the function. That is, the function must not be inlined, and it must not be included in the same compilation unit. Since optimizers are generally a compiler feature, moving the function definition to a different compilation unit deprives the optimizer of the information necessary to prove anything about the function, and consequently stops it from moving anything across the function call.

Beware that this assumes that there is no linker doing global analysis for optimization. If it does, it can still skrew you.

回答3:

What the comment you quoted has not considered is that sequence points are not primarily about order of execution (although they do constrain it, they don't act as full barriers), but rather about values of expressions.

C++11 actually gets rid of the "sequence point" terminology completely, and instead discussed ordering of "value computation" and "side effects".

To illustrate, the following code exhibits undefined behavior because it doesn't respect ordering:

int a = 5;
int x = a++ + a;

This version is well-defined:

int a = 5;
a++;
int x = a + a;

When the sequence point / ordering of side effects and value computations guarantees us, is that the a used in x = a + a is 6, not 5. So the compiler cannot rewrite it to:

int a = 5;
int x = a + a;
a++;

However, it's perfectly legal to rewrite it as:

int a = 5;
int x = (a+1) + (a+1);
a++;

The order of execution between assigning x and assigning a isn't constrained, because neither of them is volatile or atomic<T> and they aren't externally visible side effects.

回答4:

The standard leaves definitively free room for the optimizer to sequence operations across the boundary of a function:

1.9/15 Every evaluation in the calling function (including other function calls) that is not otherwise specifically sequenced before or after the execution of the body of the called function is indeterminately sequenced with respect to the execution of the called function.

as long as the as-if rule is respectd:

1.9/5 A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input.

The practice of leaving the optimizer in the blind as suggested by cmaster is in gegeral very efective. By the way, the global optimization issue at linking can also be circumvented using dynamic linking of the benchmarked function.

There's however another a hard sequencing constraint that can be used to achieve the same purpose, even within the same compilation unit:

1.9/15 When calling a function (whether or not the function is inline), every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of every expression or statement in the body of the called function.

So you may use safely an expression like:

 my_timer_off(stop, f( my_timer_on(start) ) );

This "functional" writing ensures that:

my_timer_on() is evaluated before any statement of f() is executed,
f() is called before the body of my_timer_off() is executed
thus ensuring the sequence timer-on / f / timer-off (the my_timer_xx would take the start/stop by value).

Of course, this assumes that the signature of the benchmarked function f() can be changed to allow the expression above.

来源：https://stackoverflow.com/questions/29593156/is-a-c-optimizer-allowed-to-move-statements-across-a-function-call

标签

c++

optimization

inline

operator-precedence