memory-model | 易学教程

Regarding instruction ordering in executions of cache-miss loads before cache-hit stores on x86

阅读更多关于 Regarding instruction ordering in executions of cache-miss loads before cache-hit stores on x86

Given the small program shown below (handcrafted to look the same from a sequential consistency / TSO perspective), and assuming it's being run by a superscalar out-of-order x86 cpu: Load A <-- A in main memory Load B <-- B is in L2 Store C, 123 <-- C is L1 I have a few questions: Assuming a big enough instruction-window, will the three instructions be fetched, decoded, executed at the same time? I assume not, as that would break execution in program order. The 2nd load is going to take longer to fetch A from memory than B. Will the later have to wait until the first is fully executed? Will

How do data caches route the object in this example?

阅读更多关于 How do data caches route the object in this example?

Consider the diagrammed data cache architecture. (ASCII art follows.) -------------------------------------- | CPU core A | CPU core B | | |------------|------------| Devices | | Cache A1 | Cache B1 | with DMA | |-------------------------| | | Cache 2 | | |------------------------------------| | RAM | -------------------------------------- Suppose that an object is shadowed on a dirty line of Cache A1, an older version of the same object is shadowed on a clean line of Cache 2, and the newest version of the same object has recently been written to RAM via DMA. Diagram: -------------------------

Do locked instructions provide a barrier between weakly-ordered accesses?

阅读更多关于 Do locked instructions provide a barrier between weakly-ordered accesses?

问题 On x86, lock -prefixed instructions such as lock cmpxchg provide barrier semantics in addition to their atomic operation: for normal memory access on write-back memory regions, reads and writes are not re-ordered across lock -prefixed instructions, per section 8.2.2 of Volume 3 of the Intel SDM: Reads or writes cannot be reordered with I/O instructions, locked instructions, or serializing instructions. This section applies only to write-back memory types. In the same list, you find an

Does exchange or compare_and_exchange reads last value in modification order?

阅读更多关于 Does exchange or compare_and_exchange reads last value in modification order?

问题 I am reading C++ Concurrency in Action by Anthony Williams. At section "Understanding Relaxed Ordering" it has: There are a few additional things you can tell the man in the cubicle, such as “write down this number, and tell me what was at the bottom of the list ” (exchange) and “write down this number if the number on the bottom of the list is that; otherwise tell me what I should have guessed” (compare_exchange_strong), but that doesn’t affect the general principle. Does it mean that such

Is mov + mfence safe on NUMA?

阅读更多关于 Is mov + mfence safe on NUMA?

I see that g++ generates a simple mov for x.load() and mov + mfence for x.store(y) . Consider this classic example: #include<atomic> #include<thread> std::atomic<bool> x,y; bool r1; bool r2; void go1(){ x.store(true); } void go2(){ y.store(true); } bool go3(){ bool a=x.load(); bool b=y.load(); r1 = a && !b; } bool go4(){ bool b=y.load(); bool a=x.load(); r2= b && !a; } int main() { std::thread t1(go1); std::thread t2(go2); std::thread t3(go3); std::thread t4(go4); t1.join(); t2.join(); t3.join(); t4.join(); return r1*2 + r2; } in which according to https://godbolt.org/z/APS4ZY go1 and go2 are

Is mov + mfence safe on NUMA?

阅读更多关于 Is mov + mfence safe on NUMA?

问题 I see that g++ generates a simple mov for x.load() and mov + mfence for x.store(y) . Consider this classic example: #include<atomic> #include<thread> std::atomic<bool> x,y; bool r1; bool r2; void go1(){ x.store(true); } void go2(){ y.store(true); } bool go3(){ bool a=x.load(); bool b=y.load(); r1 = a && !b; } bool go4(){ bool b=y.load(); bool a=x.load(); r2= b && !a; } int main() { std::thread t1(go1); std::thread t2(go2); std::thread t3(go3); std::thread t4(go4); t1.join(); t2.join(); t3

pthread_create(3) and memory synchronization guarantee in SMP architectures

阅读更多关于 pthread_create(3) and memory synchronization guarantee in SMP architectures

问题 I am looking at the section 4.11 of The Open Group Base Specifications Issue 7 (IEEE Std 1003.1, 2013 Edition), section 4.11 document, which spells out the memory synchronization rules. This is the most specific by the POSIX standard I have managed to come by for detailing the POSIX/C memory model. Here's a quote 4.11 Memory Synchronization Applications shall ensure that access to any memory location by more than one thread of control (threads or processes) is restricted such that no thread

Memory Model: preventing store-release and load-acquire reordering

阅读更多关于 Memory Model: preventing store-release and load-acquire reordering

It is known that, unlike Java's volatiles, .NET's ones allow reordering of volatile writes with the following volatile reads from another location. When it is a problem MemoryBarier is recommended to be placed between them, or Interlocked.Exchange can be used instead of volatile write. It works but MemoryBarier could be a performance killer when used in highly optimized lock-free code. I thought about it a bit and came with an idea. I want somebody to tell me if I took the right way. So, the idea is the following: We want to prevent reordering between these two accesses: volatile1 write

Acquire/release semantics with 4 threads

阅读更多关于 Acquire/release semantics with 4 threads

I am currently reading C++ Concurrency in Action by Anthony Williams. One of his listing shows this code, and he states that the assertion that z != 0 can fire. #include <atomic> #include <thread> #include <assert.h> std::atomic<bool> x,y; std::atomic<int> z; void write_x() { x.store(true,std::memory_order_release); } void write_y() { y.store(true,std::memory_order_release); } void read_x_then_y() { while(!x.load(std::memory_order_acquire)); if(y.load(std::memory_order_acquire)) ++z; } void read_y_then_x() { while(!y.load(std::memory_order_acquire)); if(x.load(std::memory_order_acquire)) ++z;

How do memory_order_seq_cst and memory_order_acq_rel differ?

阅读更多关于 How do memory_order_seq_cst and memory_order_acq_rel differ?

Stores are release operations and loads are acquire operations for both. I know that memory_order_seq_cst is meant to impose an additional total ordering for all operations, but I'm failing to build an example where it isn't the case if all the memory_order_seq_cst are replaced by memory_order_acq_rel . Do I miss something, or the difference is just a documentation effect, i.e. one should use memory_order_seq_cst if one intend not to play with a more relaxed model and use memory_order_acq_rel when constraining the relaxed model? MSN http://en.cppreference.com/w/cpp/atomic/memory_order has a