Can anyone explain what is std::memory_order in plain English, and how to use them with std::atomic<>?
I found the reference and few
In brief, your compiler and CPU may execute instructions in order different from how you have written them. For a single-thread this is not an issue as it will appear correct. For multiple threads on multiple processors this becomes an issue. Memory ordering in C++ restricts what your compiler/CPU can do and fixes such issues.
For example, if you look at my article on double-check locking you can see how ordering messes with that pattern -- it mention show atomic memory ordering can be used to fix it.
About the reordering itself you can also consider CPU Reordering -- again, the compiler may also be doing reorders as well.
Be aware that any documents on this topic (including mine) offer speak of theoretical scenarios. The most common CPUs, like x86, have very strong ordering guarantees such that a lot of explicit ordering is simply not needed. Thus even if you don't use the proper C++11 atomics your code will likely still work.
As zvrba mentioned, the topic is actually quite detailed. The linux kernel doc on memory barriers also contains a lot of detailed information.
Can anyone explain what is std::memory_order in plain English,
The best "Plain English" explanation I've found for the various memory orderings is Bartoz Milewski's article on relaxed atomics: http://bartoszmilewski.com/2008/12/01/c-atomics-and-memory-ordering/
And the follow-up post: http://bartoszmilewski.com/2008/12/23/the-inscrutable-c-memory-model/
But note that whilst these articles are a good introduction, they pre-date the C++11 standard and won't tell you everything you need to know to use them safely.
and how to use them with std::atomic<>?
My best advice to you here is: don't. Relaxed atomics are (probably) the trickiest and most dangerous thing in C++11. Stick to std::atomic<T> with the default memory ordering (sequential consistency) until you're really, really sure that you have a performance problem that can be solved by using the relaxed memory orderings.
In the second article linked above, Bartoz Milewski reaches the following conclusion:
I had no idea what I was getting myself into when attempting to reason about C++ weak atomics. The theory behind them is so complex that it’s borderline unusable. It took three people (Anthony, Hans, and me) and a modification to the Standard to complete the proof of a relatively simple algorithm. Imagine doing the same for a lock-free queue based on weak atomics!
No. A "plain english" explanation takes 32 pages and can be found here.
If you don't want to read that, you can forget about memory ordering because the page you linked to says that the default is sequentially-consistent ordering, which is "always do the sane thing"-setting.
To use any other setting you really have to read and understand the above paper and the examples in it.
The std::memory_order values allow you to specify fine-grained constraints on the memory ordering provided by your atomic operations. If you are modifying and accessing atomic variables from multiple threads, then passing the std::memory_order values to your operations allow you to relax the constraints on the compiler and processor about the order in which the operations on those atomic variables become visible to other threads, and the synchronization effects those operations have on the non-atomic data in your application.
The default ordering of std::memory_order_seq_cst is the most constrained, and provides the "intuitive" properties you might expect: if thread A stores some data and then sets an atomic flag using std::memory_order_seq_cst, then if thread B sees the flag is set then it can see that data written by thread A. The other memory ordering values do not necessarily provide this guarantee, and must therefore be used very carefully.
The basic premise is: do not use anything other than std::memory_order_seq_cst (the default) unless (a) you really really know what you are doing, and can prove that the relaxed usage is safe in all cases, and (b) your profiler demonstrates that the data structure and operations you are intending to use the relaxed orderings with are a bottleneck.
My book, C++ Concurrency in Action devotes a whole chapter (45 pages) to the details of the C++ memory model, atomic operations and the std::memory_order constraints, and a further chapter (44 pages) to using atomic operations for synchronization in lock-free data structures, and the consequences of relaxed ordering constraints.
My blog entries on Dekker's algorithm and Peterson's algorithm for mutual exclusion demonstrate some of the issues.
There is some plain english in GCC wiki. ;)
http://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync