I am currently reading C++ Concurrency in Action by Anthony Williams. One of his listing shows this code, and he states that the assertion that z != 0
can fire.
The release-acquire synchronization has (at least) this guarantee: side-effects before a release on a memory location are visible after an acquire on this memory location.
There is no such guarantee if the memory location is not the same. More importantly, there's no total (think global) ordering guarantee.
Looking at the example, thread A makes thread C come out of its loop, and thread B makes thread D come out of its loop.
However, the way a release may "publish" to an acquire (or the way an acquire may "observe" a release) on the same memory location doesn't require total ordering. It's possible for thread C to observe A's release and thread D to observe B's release, and only somewhere in the future for C to observe B's release and for D to observe A's release.
The example has 4 threads because that's the minimum example you can force such non-intuitive behavior. If any of the atomic operations were done in the same thread, there would be an ordering you couldn't violate.
For instance, if write_x
and write_y
happened on the same thread, it would require that whatever thread observed a change in y
would have to observe a change in x
.
Similarly, if read_x_then_y
and read_y_then_x
happened on the same thread, you would observe both changed in x
and y
at least in read_y_then_x
.
Having write_x
and read_x_then_y
in the same thread would be pointless for the exercise, as it would become obvious it's not synchronizing correctly, as would be having write_x
and read_y_then_x
, which would always read the latest x
.
EDIT:
The way, I am reasoning about this is that if
thread a
(write_x
) stores tox
then all the work it has done so far is synced with any other thread that readsx
with acquire ordering.(...) I can't think of any 'run' or memory ordering where
z
is never incremented. Can someone explain where my reasoning is flawed?Also, I know The loop read will always be before the if statement read because the acquire prevents this reordering.
That's sequentially consistent order, which imposes a total order. That is, it imposes that write_x
and write_y
both be visible to all threads one after the other; either x
then y
or y
then x
, but the same order for all threads.
With release-acquire, there is no total order. The effects of a release are only guaranteed to be visible to a corresponding acquire on the same memory location. With release-acquire, the effects of write_x
are guaranteed to be visible to whoever notices x
has changed.
This noticing something changed is very important. If you don't notice a change, you're not synchronizing. As such, thread C is not synchronizing on y
and thread D is not synchronizing on x
.
Essentially, it's way easier to think of release-acquire as a change notification system that only works if you synchronize properly. If you don't synchronize, you may or may not observe side-effects.
Strong memory model hardware architectures with cache coherence even in NUMA, or languages/frameworks that synchronize in terms of total order, make it difficult to think in these terms, because it's practically impossible to observe this effect.