As known in since C++11 there are 6 memory orders, and in documentation written about std::memory_order_acquire
:
A load operation with this memory order performs the acquire operation on the affected memory location: no memory accesses in the current thread can be reordered before this load.
That's like a rule of thumb of compiler code generation.
But that's absolutely not an axiom of C++.
There are many cases, some trivially detectable, some requiring more work, where an operation on memory Op on V can be provably reordered with an atomic operation X on A.
The two most obvious cases:
(Note that these two reordering by the compiler are valid for any of the possible memory ordering specified for X.)
In any case, the transformation is not visible, it doesn't change the possible executions of valid programs.
There are less obvious cases where these types of code transformations are valid. Some are contrived, some are realistic.
I can easily come up with this contrived example:
using namespace std;
static atomic A;
int do_acq() {
return A.load(memory_order_acquire);
}
void do_rel() {
A.store(0, memory_order_release);
} // that's all folks for that TU
Note:
the use of static variable to be able to see all operations on the object, on separately compiled code; the functions which access the atomic synchronization object are not static and can be called from all the program.
As a synchronization primitive, operations on A establish synchronize-with relations: there is one between:
do_rel()
at point pXdo_acq()
at point pYThere is a well defined order of modification M of A corresponding to the calls to do_rel()
in different threads. Each call to do_acq()
either:
do_rel()
at pX_i and synchronizes with thread X by pulling in the history of X at pX_iOn the other hand, the value is always 0, so the calling code only gets a 0 from do_acq()
and cannot determine what happened from the return value. It can know a priori that a modification of A has already happened, but it can't know only a posteriori. The a priori knowledge can come from another synchronization operation. A priori knowledge is part of the history of thread Y. Either way, the acquire operation does not had knowledge and does not add a past history: the known part of the acquire operation is empty, it doesn't reliably acquire anything that was in the past of thread Y at pY_i. So the acquire on A is meaningless and can be optimized out.
In other words: A program valid for all possible values of M must be valid when do_acq()
sees the most recent do_rel()
in history of Y, the one that is before all modifications of A that can be seen. So do_rel() adds nothing in general: do_rel()
can add a non redundant synchronize-with in some executions, but the minimum of what it adds Y is nothing, so a correct program, one that doesn't have a race condition (expressed as: its behavior depends on M, such as its correctness is a function of getting some subset of the allowable values for M) must be prepared to handle getting nothing from do_rel()
; so the compiler can make do_rel()
a NOP.
[Note: That the line of argument doesn't easily generalizes to all RMW operations that read a 0 and store a 0. It probably can't work for acq-rel RMW. In other words, acq+rel RMW are more powerful than separate loads and stores, for their “side effect”.]
Summary: in that particular example, not only the memory operations can move up and down with respect to an atomic acquire operation, the atomic operations can be removed completely.