问题
Let's consider the following two-thread concurrent program in C++:
x,y
are globals, r1,r2
are thread-local, store
and load
to int
is atomic.
Memory model = C++11
int x = 0, int y = 0
r1 = x | r2 = y
y = r1 | x = r2
A compiler is allowed to compile it as:
int x = 0, int y = 0
r1 = x | r2 = 42
y = r1 | x = r2
| if(y != 42)
| x = r2 = y
And, while it is intra-thread consistent, it can result in wild results, because it is possible that execution of that program results in (x, y) = (42, 42)
It is called Out of Thin Air values problem. And it exists and we have to live with that.
My question is: Does a memory barrier prevent a compiler from doing wild optimizations that result in out-of-thin-air values?
For example:
[fence] = atomic_thread_fence(memory_order_seq_cst);
int x = 0, int y = 0
r1 = x | r2 = y
[fence] | [fence]
y = r1 | x = r2
回答1:
You have data race Undefined Behaviour on x
and y
because they're non-atomic
variables, so the C++11 standard has absolutely nothing to say about what's allowed to happen.
It would be relevant to look at this for older language standards without a formal memory model where people did threading anyway using volatile
or plain int
and compiler + asm barriers, where behaviour could depend on compilers working the way you expect in a case like this. But fortunately the bad old days of "happens to work on current implementations" threading are behind us.
Barriers are not helpful here with nothing to create synchronization; as @davmac explains, nothing requires the barriers to "line up" in the global order of operations. Think of a barrier as an operation that makes the current thread wait for some or all of its previous operations to become globally visible; barriers don't directly interact with other threads.
Out-of-thin-air values is one thing that can happen as a result of that undefined behaviour; the compiler is allowed to do software value-prediction on non-atomic variables, and invent writes to objects that will definitely be written anyway. If there was a release-store, or a relaxed store + a barrier, the compiler might not be allowed to invent writes before it, because that could create
In general from a C++11 language-lawyer perspective, there's nothing you can do to make your program safe (other than a mutex or hand-rolled locking with atomics to prevent one thread from reading x
while the other is writing it.)
Relaxed atomics are sufficient to prevent the compiler from inventing writes without any other cost.
Except maybe defeating auto-vectorization and stuff, if you were counting on other uses of this variable being aggressively optimized.
atomic_int x = 0, y = 0
r1 = x.load(mo_relaxed) | r2 = y.load(mo_relaxed)
y.store(r1, mo_relaxed) | x.store(r2, mo_relaxed)
Value-prediction could speculatively get a future value for r2
into the pipeline before thread 2 sees that value from y
, but it can't actually become visible to other threads until the software or hardware knows for sure that the prediction was correct. (That would be inventing a write).
e.g. thread 2 is allowed to compile as
r2 = y.load(mo_relaxed);
if (r2 == 42) { // control dependency, not a data dependency
x.store(42, mo_relaxed);
} else {
x.store(r2, mo_relaxed);
}
But as I said, x = 42;
can't become visible to other threads until it's non-speculative (hardware or software speculation), so value prediction can't invent values that other threads can see. The C++11 standard guarantees that atomics
I don't know / can't think of any mechanism by which a store of 42
could actually be visible to other threads before the y.load
saw an actual 42. (i.e. LoadStore reordering of a load with a later dependent store). I don't think the C++ standard formally guarantees that, though. Maybe really aggressive inter-thread optimization if the compiler can prove that r2
will always be 42 in some cases, and remove even the control dependency?
An acquire-load or release-store would definitely be sufficient to block causality violations. This isn't quite mo_consume
, because r2
is used as a value, not a pointer.
回答2:
Not by itself. In your example, there is nothing synchronising the two threads. In particular, the fence in both thread do not cause the threads to synchronise at that point; For example, you might get the following sequence:
(Thread #1) | (Thread #2)
r1 = x |
[fence] |
y = junk temporary |
| r2 = y // junk!
| [fence]
| x = r2
y = r1 |
The simplest way to avoid out-of-thin-air results is to use atomic integers: if x and y atomic then they cannot have "out of thin air" values:
std::atomic_int x = 0, y = 0;
int r1 = x; | int r2 = y;
y = r1; | x = r2;
来源:https://stackoverflow.com/questions/51232730/preventing-of-out-of-thin-air-values-with-a-memory-barrier-in-c