memory-model

Regarding instruction ordering in executions of cache-miss loads before cache-hit stores on x86

天涯浪子 提交于 2019-12-01 23:29:42
Given the small program shown below (handcrafted to look the same from a sequential consistency / TSO perspective), and assuming it's being run by a superscalar out-of-order x86 cpu: Load A <-- A in main memory Load B <-- B is in L2 Store C, 123 <-- C is L1 I have a few questions: Assuming a big enough instruction-window, will the three instructions be fetched, decoded, executed at the same time? I assume not, as that would break execution in program order. The 2nd load is going to take longer to fetch A from memory than B. Will the later have to wait until the first is fully executed? Will

How do data caches route the object in this example?

时间秒杀一切 提交于 2019-12-01 20:04:17
Consider the diagrammed data cache architecture. (ASCII art follows.) -------------------------------------- | CPU core A | CPU core B | | |------------|------------| Devices | | Cache A1 | Cache B1 | with DMA | |-------------------------| | | Cache 2 | | |------------------------------------| | RAM | -------------------------------------- Suppose that an object is shadowed on a dirty line of Cache A1, an older version of the same object is shadowed on a clean line of Cache 2, and the newest version of the same object has recently been written to RAM via DMA. Diagram: -------------------------

Do locked instructions provide a barrier between weakly-ordered accesses?

梦想的初衷 提交于 2019-12-01 19:52:11
问题 On x86, lock -prefixed instructions such as lock cmpxchg provide barrier semantics in addition to their atomic operation: for normal memory access on write-back memory regions, reads and writes are not re-ordered across lock -prefixed instructions, per section 8.2.2 of Volume 3 of the Intel SDM: Reads or writes cannot be reordered with I/O instructions, locked instructions, or serializing instructions. This section applies only to write-back memory types. In the same list, you find an

Does exchange or compare_and_exchange reads last value in modification order?

末鹿安然 提交于 2019-12-01 06:51:54
问题 I am reading C++ Concurrency in Action by Anthony Williams. At section "Understanding Relaxed Ordering" it has: There are a few additional things you can tell the man in the cubicle, such as “write down this number, and tell me what was at the bottom of the list ” (exchange) and “write down this number if the number on the bottom of the list is that; otherwise tell me what I should have guessed” (compare_exchange_strong), but that doesn’t affect the general principle. Does it mean that such

Is mov + mfence safe on NUMA?

梦想的初衷 提交于 2019-12-01 06:15:01
I see that g++ generates a simple mov for x.load() and mov + mfence for x.store(y) . Consider this classic example: #include<atomic> #include<thread> std::atomic<bool> x,y; bool r1; bool r2; void go1(){ x.store(true); } void go2(){ y.store(true); } bool go3(){ bool a=x.load(); bool b=y.load(); r1 = a && !b; } bool go4(){ bool b=y.load(); bool a=x.load(); r2= b && !a; } int main() { std::thread t1(go1); std::thread t2(go2); std::thread t3(go3); std::thread t4(go4); t1.join(); t2.join(); t3.join(); t4.join(); return r1*2 + r2; } in which according to https://godbolt.org/z/APS4ZY go1 and go2 are

Is mov + mfence safe on NUMA?

老子叫甜甜 提交于 2019-12-01 05:32:31
问题 I see that g++ generates a simple mov for x.load() and mov + mfence for x.store(y) . Consider this classic example: #include<atomic> #include<thread> std::atomic<bool> x,y; bool r1; bool r2; void go1(){ x.store(true); } void go2(){ y.store(true); } bool go3(){ bool a=x.load(); bool b=y.load(); r1 = a && !b; } bool go4(){ bool b=y.load(); bool a=x.load(); r2= b && !a; } int main() { std::thread t1(go1); std::thread t2(go2); std::thread t3(go3); std::thread t4(go4); t1.join(); t2.join(); t3

pthread_create(3) and memory synchronization guarantee in SMP architectures

谁说我不能喝 提交于 2019-11-30 20:15:44
问题 I am looking at the section 4.11 of The Open Group Base Specifications Issue 7 (IEEE Std 1003.1, 2013 Edition), section 4.11 document, which spells out the memory synchronization rules. This is the most specific by the POSIX standard I have managed to come by for detailing the POSIX/C memory model. Here's a quote 4.11 Memory Synchronization Applications shall ensure that access to any memory location by more than one thread of control (threads or processes) is restricted such that no thread

Memory Model: preventing store-release and load-acquire reordering

生来就可爱ヽ(ⅴ<●) 提交于 2019-11-30 15:25:12
It is known that, unlike Java's volatiles, .NET's ones allow reordering of volatile writes with the following volatile reads from another location. When it is a problem MemoryBarier is recommended to be placed between them, or Interlocked.Exchange can be used instead of volatile write. It works but MemoryBarier could be a performance killer when used in highly optimized lock-free code. I thought about it a bit and came with an idea. I want somebody to tell me if I took the right way. So, the idea is the following: We want to prevent reordering between these two accesses: volatile1 write

Acquire/release semantics with 4 threads

无人久伴 提交于 2019-11-30 06:30:27
I am currently reading C++ Concurrency in Action by Anthony Williams. One of his listing shows this code, and he states that the assertion that z != 0 can fire. #include <atomic> #include <thread> #include <assert.h> std::atomic<bool> x,y; std::atomic<int> z; void write_x() { x.store(true,std::memory_order_release); } void write_y() { y.store(true,std::memory_order_release); } void read_x_then_y() { while(!x.load(std::memory_order_acquire)); if(y.load(std::memory_order_acquire)) ++z; } void read_y_then_x() { while(!y.load(std::memory_order_acquire)); if(x.load(std::memory_order_acquire)) ++z;

How do memory_order_seq_cst and memory_order_acq_rel differ?

痞子三分冷 提交于 2019-11-30 00:01:21
Stores are release operations and loads are acquire operations for both. I know that memory_order_seq_cst is meant to impose an additional total ordering for all operations, but I'm failing to build an example where it isn't the case if all the memory_order_seq_cst are replaced by memory_order_acq_rel . Do I miss something, or the difference is just a documentation effect, i.e. one should use memory_order_seq_cst if one intend not to play with a more relaxed model and use memory_order_acq_rel when constraining the relaxed model? MSN http://en.cppreference.com/w/cpp/atomic/memory_order has a