Does standard C++11 guarantee that memory_order_seq_cst prevents StoreLoad reordering of non-atomic around an atomic?

假如想象 提交于 2019-11-30 04:01:46

No, standard C++11 doesn't guarantee that memory_order_seq_cst prevents StoreLoad reordering of non-atomic around an atomic(seq_cst).

Even standard C++11 doesn't guarantee that memory_order_seq_cst prevents StoreLoad reordering of atomic(non-seq_cst) around an atomic(seq_cst).

Working Draft, Standard for Programming Language C++ 2016-07-12: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/n4606.pdf

  • There shall be a single total order S on all memory_order_seq_cst operations - C++11 Standard:

§ 29.3

3

There shall be a single total order S on all memory_order_seq_cst operations, consistent with the “happens before” order and modification orders for all affected locations, such that each memory_order_seq_cst operation B that loads a value from an atomic object M observes one of the following values: ...

  • But, any atomic operations with ordering weaker than memory_order_seq_cst hasn't sequential consistency and hasn't single total order, i.e. non-memory_order_seq_cst operations can be reordered with memory_order_seq_cst operations in allowed directions - C++11 Standard:

§ 29.3

8 [ Note: memory_order_seq_cst ensures sequential consistency only for a program that is free of data races and uses exclusively memory_order_seq_cst operations. Any use of weaker ordering will invalidate this guarantee unless extreme care is used. In particular, memory_order_seq_cst fences ensure a total order only for the fences themselves. Fences cannot, in general, be used to restore sequential consistency for atomic operations with weaker ordering specifications. — end note ]


Also C++-compilers allows such reorderings:

  1. On x86_64

Usually - if in compilers seq_cst implemented as barrier after store, then:

STORE-C(relaxed); LOAD-B(seq_cst); can be reordered to LOAD-B(seq_cst); STORE-C(relaxed);

Screenshot of Asm generated by GCC 7.0 x86_64: https://godbolt.org/g/4yyeby

Also, theoretically possible - if in compilers seq_cst implemented as barrier before load, then:

STORE-A(seq_cst); LOAD-C(acq_rel); can be reordered to LOAD-C(acq_rel); STORE-A(seq_cst);

  1. On PowerPC

STORE-A(seq_cst); LOAD-C(relaxed); can be reordered to LOAD-C(relaxed); STORE-A(seq_cst);

Also on PowerPC can be such reordering:

STORE-A(seq_cst); STORE-C(relaxed); can reordered to STORE-C(relaxed); STORE-A(seq_cst);

If even atomic variables are allowed to be reordered across atomic(seq_cst), then non-atomic variables can also be reordered across atomic(seq_cst).

Screenshot of Asm generated by GCC 4.8 PowerPC: https://godbolt.org/g/BTQBr8


More details:

  1. On x86_64

STORE-C(release); LOAD-B(seq_cst); can be reordered to LOAD-B(seq_cst); STORE-C(release);

Intel® 64 and IA-32 Architectures

8.2.3.4 Loads May Be Reordered with Earlier Stores to Different Locations

I.e. x86_64 code:

STORE-A(seq_cst);
STORE-C(release); 
LOAD-B(seq_cst);

Can be reordered to:

STORE-A(seq_cst);
LOAD-B(seq_cst);
STORE-C(release); 

This can happen because between c.store and b.load isn't mfence:

x86_64 - GCC 7.0: https://godbolt.org/g/dRGTaO

C++ & asm - code:

#include <atomic>

// Atomic load-store
void test() {
    std::atomic<int> a, b, c;
    a.store(2, std::memory_order_seq_cst);          // movl 2,[a]; mfence;
    c.store(4, std::memory_order_release);          // movl 4,[c];
    int tmp = b.load(std::memory_order_seq_cst);    // movl [b],[tmp];
}

It can be reordered to:

#include <atomic>

// Atomic load-store
void test() {
    std::atomic<int> a, b, c;
    a.store(2, std::memory_order_seq_cst);          // movl 2,[a]; mfence;
    int tmp = b.load(std::memory_order_seq_cst);    // movl [b],[tmp];
    c.store(4, std::memory_order_release);          // movl 4,[c];
}

Also, Sequential Consistency in x86/x86_64 can be implemented in four ways: http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html

  1. LOAD (without fence) and STORE + MFENCE
  2. LOAD (without fence) and LOCK XCHG
  3. MFENCE + LOAD and STORE (without fence)
  4. LOCK XADD ( 0 ) and STORE (without fence)
  • 1 and 2 ways: LOAD and (STORE+MFENCE)/(LOCK XCHG) - we reviewed above
  • 3 and 4 ways: (MFENCE+LOAD)/LOCK XADD and STORE - allow next reordering:

STORE-A(seq_cst); LOAD-C(acq_rel); can be reordered to LOAD-C(acq_rel); STORE-A(seq_cst);


  1. On PowerPC

STORE-A(seq_cst); LOAD-C(relaxed); can be reordered to LOAD-C(relaxed); STORE-A(seq_cst);

Allows Store-Load reordering (Table 5 - PowerPC): http://www.rdrop.com/users/paulmck/scalability/paper/whymb.2010.06.07c.pdf

Stores Reordered After Loads

I.e. PowerPC code:

STORE-A(seq_cst);
STORE-C(relaxed); 
LOAD-C(relaxed); 
LOAD-B(seq_cst);

Can be reordered to:

LOAD-C(relaxed);
STORE-A(seq_cst);
STORE-C(relaxed); 
LOAD-B(seq_cst);

PowerPC - GCC 4.8 : https://godbolt.org/g/xowFD3

C++ & asm - code:

#include <atomic>

// Atomic load-store
void test() {
    std::atomic<int> a, b, c;       // addr: 20, 24, 28
    a.store(2, std::memory_order_seq_cst);          // li r9<-2; sync; stw r9->[a];
    c.store(4, std::memory_order_relaxed);          // li r9<-4; stw r9->[c];
    c.load(std::memory_order_relaxed);              // lwz r9<-[c];
    int tmp = b.load(std::memory_order_seq_cst);    // sync; lwz r9<-[b]; ... isync;
}

By dividing a.store into two parts - it can be reordered to:

#include <atomic>

// Atomic load-store
void test() {
    std::atomic<int> a, b, c;       // addr: 20, 24, 28
    //a.store(2, std::memory_order_seq_cst);            // part-1: li r9<-2; sync;
    c.load(std::memory_order_relaxed);              // lwz r9<-[c];
    a.store(2, std::memory_order_seq_cst);          // part-2: stw r9->[a];
    c.store(4, std::memory_order_relaxed);          // li r9<-4; stw r9->[c];
    int tmp = b.load(std::memory_order_seq_cst);    // sync; lwz r9<-[b]; ... isync;
}

Where load-from-memory lwz r9<-[c]; executed earlier than store-to-memory stw r9->[a];.


Also on PowerPC can be such reordering:

STORE-A(seq_cst); STORE-C(relaxed); can reordered to STORE-C(relaxed); STORE-A(seq_cst);

Because PowerPC has weak memory ordering model - allows Store-Store reordering (Table 5 - PowerPC): http://www.rdrop.com/users/paulmck/scalability/paper/whymb.2010.06.07c.pdf

Stores Reordered After Stores

I.e. on PowerPC operations Store can be reordered with other Store, then previous example can be reordered such as:

#include <atomic>

// Atomic load-store
void test() {
    std::atomic<int> a, b, c;       // addr: 20, 24, 28
    //a.store(2, std::memory_order_seq_cst);            // part-1: li r9<-2; sync;
    c.load(std::memory_order_relaxed);              // lwz r9<-[c];
    c.store(4, std::memory_order_relaxed);          // li r9<-4; stw r9->[c];
    a.store(2, std::memory_order_seq_cst);          // part-2: stw r9->[a];
    int tmp = b.load(std::memory_order_seq_cst);    // sync; lwz r9<-[b]; ... isync;
}

Where store-to-memory stw r9->[c]; executed earlier than store-to-memory stw r9->[a];.

The std::memory_order_seq_cst guarantees there is no reordering by either compiler nor cpu. In this case the same memory order as if only one instruction where executed at a time.

But the compiler optimization confuses the issues, if you turn off -O3 then the fence is there.

The compiler can see that in your test program with -O3 that there are no consequence of the mfence as the program is too simple.

If you ran it on an Arm on the other hand like this you can see the barriers dmb ish.

So if your program is more complex you might see the mfence in this part of the code but not if the compiler can analyse and reason that it is not needed.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!