Using an atomic read-modify-write operation in a release sequence

☆樱花仙子☆ 提交于 2021-02-17 21:49:06

问题


Say, I create an object of type Foo in thread #1 and want to be able to access it in thread #3.
I can try something like:

std::atomic<int> sync{10};
Foo *fp;

// thread 1: modifies sync: 10 -> 11
fp = new Foo;
sync.store(11, std::memory_order_release);

// thread 2a: modifies sync: 11 -> 12
while (sync.load(std::memory_order_relaxed) != 11);
sync.store(12, std::memory_order_relaxed);

// thread 3
while (sync.load(std::memory_order_acquire) != 12);
fp->do_something();
  • The store/release in thread #1 orders Foo with the update to 11
  • thread #2a non-atomically increments the value of sync to 12
  • the synchronizes-with relationship between thread #1 and #3 is only established when #3 loads 11

The scenario is broken because thread #3 spins until it loads 12, which may arrive out of order (wrt 11) and Foo is not ordered with 12 (due to the relaxed operations in thread #2a).
This is somewhat counter-intuitive since the modification order of sync is 10 → 11 → 12

The standard says (§ 1.10.1-6):

an atomic store-release synchronizes with a load-acquire that takes its value from the store (29.3). [ Note: Except in the specified cases, reading a later value does not necessarily ensure visibility as described below. Such a requirement would sometimes interfere with efficient implementation. —end note ]

It also says in (§ 1.10.1-5):

A release sequence headed by a release operation A on an atomic object M is a maximal contiguous subsequence of side effects in the modification order of M, where the first operation is A, and every subsequent operation
- is performed by the same thread that performed A, or
- is an atomic read-modify-write operation.

Now, thread #2a is modified to use an atomic read-modify-write operation:

// thread 2b: modifies sync: 11 -> 12
int val;
while ((val = 11) && !sync.compare_exchange_weak(val, 12, std::memory_order_relaxed));

If this release sequence is correct, Foo is synchronized with thread #3 when it loads either 11 or 12. My questions about the use of an atomic read-modify-write are:

  • Does the scenario with thread #2b constitute a correct release sequence ?

And if so:

  • What are the specific properties of a read-modify-write operation that ensure this scenario is correct ?

回答1:


Does the scenario with thread #2b constitute a correct release sequence ?

Yes, per your quote from the standard.

What are the specific properties of a read-modify-write operation that ensure this scenario is correct?

Well, the somewhat circular answer is that the only important specific property is that "The C++ standard defines it so".

As a practical matter, one may ask why the standard defines it like this. I don't think you'll find that the answer has a deep theoretical basis: I think the committee could have also defined it such that the RMW doesn't participate in the release sequence, or (perhaps with more difficulty) have defined so that both the RMW and the separate mo_relaxed load and store participate in the release sequence, without compromising the "soundness" of the model.

They already give a performance related as to why they didn't choose the latter approach:

Such a requirement would sometimes interfere with efficient implementation.

In particular, on any hardware platform that allowed load-store reordering, it would imply that even mo_relaxed loads and/or stores might require barriers! Such platforms exist today. Even on more strongly ordered platforms, it may inhibit compiler optimizations.

So why didn't they take then take other "consistent" approach of not requiring RMW mo_relaxed to participate in the release sequence? Probably because existing hardware implementations of RMW operations provide such guarantees and the nature of RMW operations makes it likely that this will be true in the future. In particular, as Peter points in the comments above, RMW operations, even with mo_relaxed are conceptually and practically1 stronger than separate loads and stores: they would be quite useless if they didn't have a consistent total order.

Once you accept that is how hardware works, it makes sense from a performance angle to align the standard: if you didn't, you'd have people using more restrictive orderings such as mo_acq_rel just to get the release sequence guarantees, but on real hardware that has weakly ordered CAS, this doesn't come for free.


1 The "practically" part means that even the weakest forms of RMW instructions are usually relatively "expensive" operations taking a dozen cycles or more on modern hardware, while mo_relaxed loads and stores generally just compile to plain loads and stores in the target ISA.



来源:https://stackoverflow.com/questions/45694459/using-an-atomic-read-modify-write-operation-in-a-release-sequence

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!