I was reading about order of evaluation violations, and they give an example that puzzles me.
1) If a side effect on a scalar object is un-sequenced r
Actually, there's a reason not to depend on the fact that compiler will check that i
is assigned with the same value twice, so that it's possible to replace it with single assignment. What if we have some expressions?
void g(int a, int b, int c, int n) {
int i;
// hey, compiler has to prove Fermat's theorem now!
f(i = 1, i = (ipow(a, n) + ipow(b, n) == ipow(c, n)));
}
The fact that the result would be the same in most implementations in this case is incidental; the order of evaluation is still undefined. Consider f(i = -1, i = -2)
: here, order matters. The only reason it doesn't matter in your example is the accident that both values are -1
.
Given that the expression is specified as one with an undefined behaviour, a maliciously compliant compiler might display an inappropriate image when you evaluate f(i = -1, i = -1)
and abort the execution - and still be considered completely correct. Luckily, no compilers I am aware of do so.
The confusion is that storing a constant value into a local variable is not one atomic instruction on every architecture the C is designed to be run on. The processor the code runs on matters more than the compiler in this case. For example, on ARM where each instruction can not carry a complete 32 bits constant, storing an int in a variable needs more that one instruction. Example with this pseudo code where you can only store 8 bits at a time and must work in a 32 bits register, i is a int32:
reg = 0xFF; // first instruction
reg |= 0xFF00; // second
reg |= 0xFF0000; // third
reg |= 0xFF000000; // fourth
i = reg; // last
You can imagine that if the compiler wants to optimize it may interleave the same sequence twice, and you don't know what value will get written to i; and let's say that he is not very smart:
reg = 0xFF;
reg |= 0xFF00;
reg |= 0xFF0000;
reg = 0xFF;
reg |= 0xFF000000;
i = reg; // writes 0xFF0000FF == -16776961
reg |= 0xFF00;
reg |= 0xFF0000;
reg |= 0xFF000000;
i = reg; // writes 0xFFFFFFFF == -1
However in my tests gcc is kind enough to recognize that the same value is used twice and generates it once and does nothing weird. I get -1, -1 But my example is still valid as it is important to consider that even a constant may not be as obvious as it seems to be.
It looks to me like the only rule pertaining to sequencing of function argument expression is here:
3) When calling a function (whether or not the function is inline, and whether or not explicit function call syntax is used), every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of every expression or statement in the body of the called function.
This does not define sequencing between argument expressions, so we end up in this case:
1) If a side effect on a scalar object is unsequenced relative to another side effect on the same scalar object, the behavior is undefined.
In practice, on most compilers, the example you quoted will run fine (as opposed to "erasing your hard disk" and other theoretical undefined behavior consequences).
It is, however, a liability, as it depends on specific compiler behaviour, even if the two assigned values are the same. Also, obviously, if you tried to assign different values, the results would be "truly" undefined:
void f(int l, int r) {
return l < -1;
}
auto b = f(i = -1, i = -2);
if (b) {
formatDisk();
}
C++17 defines stricter evaluation rules. In particular, it sequences function arguments (although in unspecified order).
N5659 §4.6:15
Evaluations A and B are indeterminately sequenced when either A is sequenced before B or B is sequenced before A, but it is unspecified which. [ Note: Indeterminately sequenced evaluations cannot overlap, but either could be executed first. —end note ]
N5659 § 8.2.2:5
The initialization of a parameter, including every associated value computation and side effect, is indeterminately sequenced with respect to that of any other parameter.
It allows some cases which would be UB before:
f(i = -1, i = -1); // value of i is -1
f(i = -1, i = -2); // value of i is either -1 or -2, but not specified which one
First, "scalar object" means a type like a int
, float
, or a pointer (see What is a scalar Object in C++?).
Second, it may seem more obvious that
f(++i, ++i);
would have undefined behavior. But
f(i = -1, i = -1);
is less obvious.
A slightly different example:
int i;
f(i = 1, i = -1);
std::cout << i << "\n";
What assignment happened "last", i = 1
, or i = -1
? It's not defined in the standard. Really, that means i
could be 5
(see harmic's answer for a completely plausible explanation for how this chould be the case). Or you program could segfault. Or reformat your hard drive.
But now you ask: "What about my example? I used the same value (-1
) for both assignments. What could possibly be unclear about that?"
You are correct...except in the way the C++ standards committee described this.
If a side effect on a scalar object is unsequenced relative to another side effect on the same scalar object, the behavior is undefined.
They could have made a special exception for your special case, but they didn't. (And why should they? What use would that ever possibly have?) So, i
could still be 5
. Or your hard drive could be empty. Thus the answer to your question is:
It is undefined behavior because it is not defined what the behavior is.
(This deserves emphasis because many programmers think "undefined" means "random", or "unpredictable". It doesn't; it means not defined by the standard. The behavior could be 100% consistent, and still be undefined.)
Could it have been defined behavior? Yes. Was it defined? No. Hence, it is "undefined".
That said, "undefined" doesn't mean that a compiler will format your hard drive...it means that it could and it would still be a standards-compliant compiler. Realistically, I'm sure g++, Clang, and MSVC will all do what you expected. They just wouldn't "have to".
A different question might be Why did the C++ standards committee choose to make this side-effect unsequenced?. That answer will involve history and opinions of the committee. Or What is good about having this side-effect unsequenced in C++?, which permits any justification, whether or not it was the actual reasoning of the standards committee. You could ask those questions here, or at programmers.stackexchange.com.