问题
Stores are release operations and loads are acquire operations for both. I know that memory_order_seq_cst
is meant to impose an additional total ordering for all operations, but I'm failing to build an example where it isn't the case if all the memory_order_seq_cst
are replaced by memory_order_acq_rel
.
Do I miss something, or the difference is just a documentation effect, i.e. one should use memory_order_seq_cst
if one intend not to play with a more relaxed model and use memory_order_acq_rel
when constraining the relaxed model?
回答1:
http://en.cppreference.com/w/cpp/atomic/memory_order has a good example at the bottom that only works with memory_order_seq_cst
. Essentially memory_order_acq_rel
provides read and write orderings relative to the atomic variable, while memory_order_seq_cst
provides read and write ordering globally. That is, the sequentially consistent operations are visible in the same order across all threads.
The example boils down to this:
bool x= false;
bool y= false;
int z= 0;
a() { x= true; }
b() { y= true; }
c() { while (!x); if (y) z++; }
d() { while (!y); if (x) z++; }
// kick off a, b, c, d, join all threads
assert(z!=0);
Operations on z
are guarded by two atomic variables, not one, so you can't use acquire-release semantics to enforce that z
is always incremented.
回答2:
On ISAs like x86 where atomics map to barriers, and the actual machine model includes a store buffer:
seq_cst
stores require flushing the store buffer so this thread's later reads are delayed until after the store is globally visible.acq_rel
does not flush the store buffer. Normal x86 loads and stores have essentially acq and rel semantics. (seq_cst plus a store buffer with store forwarding.)But x86 atomic RMW operations always get promoted to
seq_cst
because the x86 asmlock
prefix is a full memory barrier. Other ISAs can do relaxed or acq_rel RMWs in asm.
https://preshing.com/20120515/memory-reordering-caught-in-the-act is an instructive example of the difference between a seq_cst store and a plain release store. (It's actually mov
+ mfence
vs. plain mov
in x86 asm. In practice xchg
is a more efficient way to do a seq_cst store on most x86 CPUs, but GCC does use mov
+mfence
)
Fun fact: AArch64's STLR release-store instruction is actually a sequential-release. In hardware it has loads/stores with relaxed or seq_cst, and barriers to get other strengths. But unfortunately I think rel or acq_rel has to get strengthened to seq_cst because there's no barrier or instruction that gives everything rel
needs without being even stronger and more expensive. Some other ISAs (like PowerPC) have more choices of barriers and can strengthen up to mo_rel
or mo_acq_rel
more cheaply than mo_seq_cst
.
回答3:
Still use the definition and example from memory_order. But replace memory_order_seq_cst with memory_order_release in store and memory_order_acquire in load.
Release-Acquire ordering guarantees everything that happened-before a store in one thread becomes a visible side effect in the thread that did a load. But in our example, nothing happens before store in both thread0 and thread1.
x.store(true, std::memory_order_release); // thread0
y.store(true, std::memory_order_release); // thread1
Further more, without memory_order_seq_cst, the sequential ordering of thread2 and thread3 are not guaranteed. You can imagine they becomes:
if (y.load(std::memory_order_acquire)) { ++z; } // thread2, load y first
while (!x.load(std::memory_order_acquire)); // and then, load x
if (x.load(std::memory_order_acquire)) { ++z; } // thread3, load x first
while (!y.load(std::memory_order_acquire)); // and then, load y
So, if thread2 and thread3 are executed before thread0 and thread1, that means both x and y stay false, thus, ++z is never touched, z stay 0 and the assert fires.
However, if memory_order_seq_cst enters the picture, it establishes a single total modification order of all atomic operations that are so tagged. Thus, in thread2, x.load then y.load; in thread3, y.load then x.load are sure things.
来源:https://stackoverflow.com/questions/12340773/how-do-memory-order-seq-cst-and-memory-order-acq-rel-differ