C++11 atomic x86 memory ordering

问题

In one of the docs for atomic variables in C++0x, when describing memory order, it mentions:

Release-Acquire Ordering

On strongly-ordered systems (x86, SPARC, IBM mainframe), release-acquire ordering is automatic. No additional CPU instructions are issued for this synchronization mode, only certain compiler optimizations are affected...

First is it true, that x86 follows strict memory ordering? Seems very inefficient to always impose this. Means every write and read has a fence?

Also, if I have an aligned int, on an x86 system, do the atomic variables serve any purpose at all?

回答1:

Yes, it's true that x86 has strict memory ordering, see Volume 3A, Chapter 8.2 of the Intel manuals. Older x86 processors such as the 386 provided truly strict ordering (called strong ordering) semantics, while more modern x86 processors have slightly relaxed conditions in a few cases, but nothing you need to worry about. For example, the Pentium and 486 allow read cache misses to go ahead of buffered writes when the writes are cache hits (and are therefore to different addresses from the reads).

Yes, it can be inefficient. Sometimes high-performance software is written only for other architectures with looser memory ordering requirements because of this.

Yes, atomic variables still serve a purpose on x86. They have special semantics with the compiler such that a typical read-modify-write operation happens atomically. If you have two threads incrementing an atomic variable (by which I mean a variable of type std::atomic<T> in C++11) simultaneously, you can be assured that the value will be 2 larger; without std::atomic, you might end up with the wrong value because one thread cached the current value in a register while performing the increment, even though the store to memory is atomic on x86.

回答2:

It is true that on x86 all stores have release and all loads have acquire semantics.

That doesn't and shouldn't affect the way you write C++: To write concurrent, race-free code you have to use either std::atomic constructions or locks.

What the architectural details mean is that on x86 there will be very little or no extra code generated for operations on atomic word-sized types as long as you ask for at most acquire/release ordering. (Sequential consistency will emit mfence instructions, though.) However, you still must use the C++ atomic types and cannot just omit them in order to have a correct, well-formed program. One important feature of atomic variables is that they prevent compiler reodering, which is essential to the correctness of your program.

(Pre-C++11, you would have had to use compiler-provided extensions such as GCC's __sync_* suite of functions, which would make the compiler behave correctly. If you really wanted to use naked variables, you would at least have to insert compiler barriers yourself.)

回答3:

There's a nice table of the different re-ordering operations which can occur, and that (for example) x86 does very few of them. Other architectures (notoriously Alpha) do almost anything.

For the memory models are defined by the standard, x86 et al are inherently compliant.

Your question about atomic variables has a slightly different answer. Any modification to a variable involves a race condition, such that when multiple threads update the same variable, an update can be lost. Atomic variables are defined such that they are the correct type for atomic operations, which eliminate this race condition. So one of their purposes is other than for ordering.

回答4:

Note that release/acquire semantics do not necessarily imply a mfence after each instruction. On x86 holds as can be seen in the manual referenced by @Adam Rosenfield or with a quick look on Wikipedia. Nevertheless x86 has release semantics for stores and acquire semantics for loads.

From Kerrek SB's Answer:

What the architectural details mean is that on x86 there will be very little or no extra code generated for operations on atomic word-sized types as long as you ask for at most acquire/release ordering. (Sequential consistency will emit mfence instructions, though.)

Note that sequential consistency is the default! (See for example cppreference).

This means that...

#include <atomic>
#include <cassert>
#include <string>

std::atomic<std::string*> ptr;

void producer()
{
    std::string* p  = new std::string("Hello");
    ptr = p;
}

void consumer()
{
    std::string* p2;
    while (!(p2 = ptr))
        ;
    assert(*p2 == "Hello"); // never fails
}

(g++ -std=c++11 -S -O3 on x86)

... will actually result in an mfence being emitted in the producer function to account for the aforementioned relaxation on x86 ().

Whereas for...

#include <atomic>
#include <cassert>
#include <string>

std::atomic<std::string*> ptr;

void producer()
{
    std::string* p  = new std::string("Hello");
    ptr.store(p, std::memory_order_release);
}

void consumer()
{
    std::string* p2;
    while (!(p2 = ptr.load(std::memory_order_acquire)))
        ;
    assert(*p2 == "Hello"); // never fails
}

(g++ -std=c++11 -S -O3 on x86)

...no mfence will be inserted because x86 has release semantics for stores and acquire semantics for loads.

来源：https://stackoverflow.com/questions/11836028/c11-atomic-x86-memory-ordering

标签

c++

c++11

atomic