Preventing of Out of Thin Air values with a memory barrier in C++

问题

Let's consider the following two-thread concurrent program in C++:

x,y are globals, r1,r2 are thread-local, store and load to int is atomic. Memory model = C++11

int x = 0, int y = 0
r1 = x   | r2 = y 
y = r1   | x = r2

A compiler is allowed to compile it as:

int x = 0, int y = 0
r1 = x   | r2 = 42 
y = r1   | x = r2
         | if(y != 42) 
         |    x = r2 = y

And, while it is intra-thread consistent, it can result in wild results, because it is possible that execution of that program results in (x, y) = (42, 42)

It is called Out of Thin Air values problem. And it exists and we have to live with that.

My question is: Does a memory barrier prevent a compiler from doing wild optimizations that result in out-of-thin-air values?

For example:

[fence] = atomic_thread_fence(memory_order_seq_cst);

int x = 0, int y = 0
r1 = x   | r2 = y 
[fence]  | [fence]
y = r1   | x = r2

回答1:

You have data race Undefined Behaviour on x and y because they're non-atomic variables, so the C++11 standard has absolutely nothing to say about what's allowed to happen.

It would be relevant to look at this for older language standards without a formal memory model where people did threading anyway using volatile or plain int and compiler + asm barriers, where behaviour could depend on compilers working the way you expect in a case like this. But fortunately the bad old days of "happens to work on current implementations" threading are behind us.

Barriers are not helpful here with nothing to create synchronization; as @davmac explains, nothing requires the barriers to "line up" in the global order of operations. Think of a barrier as an operation that makes the current thread wait for some or all of its previous operations to become globally visible; barriers don't directly interact with other threads.

Out-of-thin-air values is one thing that can happen as a result of that undefined behaviour; the compiler is allowed to do software value-prediction on non-atomic variables, and invent writes to objects that will definitely be written anyway. If there was a release-store, or a relaxed store + a barrier, the compiler might not be allowed to invent writes before it, because that could create

In general from a C++11 language-lawyer perspective, there's nothing you can do to make your program safe (other than a mutex or hand-rolled locking with atomics to prevent one thread from reading x while the other is writing it.)

Relaxed atomics are sufficient to prevent the compiler from inventing writes without any other cost.

Except maybe defeating auto-vectorization and stuff, if you were counting on other uses of this variable being aggressively optimized.

atomic_int x = 0, y = 0
r1 = x.load(mo_relaxed)    | r2 = y.load(mo_relaxed)
 y.store(r1, mo_relaxed)   | x.store(r2, mo_relaxed)

Value-prediction could speculatively get a future value for r2 into the pipeline before thread 2 sees that value from y, but it can't actually become visible to other threads until the software or hardware knows for sure that the prediction was correct. (That would be inventing a write).

e.g. thread 2 is allowed to compile as

r2 = y.load(mo_relaxed);
if (r2 == 42) {                   // control dependency, not a data dependency
    x.store(42, mo_relaxed);
} else {
    x.store(r2, mo_relaxed);
}

But as I said, x = 42; can't become visible to other threads until it's non-speculative (hardware or software speculation), so value prediction can't invent values that other threads can see. The C++11 standard guarantees that atomics

I don't know / can't think of any mechanism by which a store of 42 could actually be visible to other threads before the y.load saw an actual 42. (i.e. LoadStore reordering of a load with a later dependent store). I don't think the C++ standard formally guarantees that, though. Maybe really aggressive inter-thread optimization if the compiler can prove that r2 will always be 42 in some cases, and remove even the control dependency?

An acquire-load or release-store would definitely be sufficient to block causality violations. This isn't quite mo_consume, because r2 is used as a value, not a pointer.

回答2:

Not by itself. In your example, there is nothing synchronising the two threads. In particular, the fence in both thread do not cause the threads to synchronise at that point; For example, you might get the following sequence:

  (Thread #1)       |   (Thread #2)
r1 = x              |
[fence]             |
y = junk temporary  |
                    | r2 = y    // junk!
                    | [fence]
                    | x = r2
y = r1              |

The simplest way to avoid out-of-thin-air results is to use atomic integers: if x and y atomic then they cannot have "out of thin air" values:

std::atomic_int x = 0, y = 0;
int r1 = x;    |    int r2 = y;
y = r1;        |    x = r2;

来源：https://stackoverflow.com/questions/51232730/preventing-of-out-of-thin-air-values-with-a-memory-barrier-in-c

标签

c++

x86

memory-barriers

memory-model