memory-model | 易学教程

atomic load store with memory order

阅读更多关于 atomic load store with memory order

问题 Thread A runs x.store(1, std::memory_order_release) first, then thread B runs x.load(std::memory_order_acquire) . x in thread B is not guaranteed to read 1 stored by A. If I use memory_order_seq_cst , will it be guaranteed to read 1? 回答1: There is no difference between memory orderings with regards to load/store of one atomic variable. This is because std::memory_order specifies how regular, non-atomic memory accesses are to be ordered around an atomic operation . Read std::memory_order for

Weak guarantees for non-atomic writes on GPUs?

阅读更多关于 Weak guarantees for non-atomic writes on GPUs?

问题 OpenCL and CUDA have included atomic operations for several years now (although obviously not every CUDA or OpenCL device supports these). But - my question is about the possibility of "living with" races due to non-atomic writes. Suppose several threads in a grid all write to the same location in global memory. Are we guaranteed that, when kernel execution has concluded, the results of one of these writes will be present in that location, rather than some junk? Relevant parameters for this

C++ value representation of non-trivially-copyable types

阅读更多关于 C++ value representation of non-trivially-copyable types

问题 The current draft of the C++ standard (march 2019) has the following paragraph ([basic.types] p.4) (emphasis mine): The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T). The value representation of an object of type T is the set of bits that participate in representing a value of type T. Bits in the object representation that are not part of the value representation are padding bits. For

Possible to use C11 fences to reason about writes from other threads?

阅读更多关于 Possible to use C11 fences to reason about writes from other threads?

问题 Adve and Gharachorloo's report, in Figure 4b, provides the following example of a program that exhibits unexpected behavior in the absence of sequential consistency: My question is whether it is possible, using only C11 fences and memory_order_relaxed loads and stores, to ensure that register1, if written, will be written with the value 1. The reason this might be hard to guarantee in the abstract is that P1, P2, and P3 could be at different points in a pathological NUMA network with the

Does this example contain a data race?

阅读更多关于 Does this example contain a data race?

问题 Here is the originan question, but mine have some differences with it. C++ memory model - does this example contain a data race? My question: //CODE-1: initially, x == 0 and y == 0 if (x) y++; // pthread 1 if (y) x++; // pthread 2 Note: the code above is written in C, not C++ (without a memory model). So does it contain a data race? From my point of view: if we view the code in Sequential Consistency memory model, there is no data race because x and y will never be both non-zero at the same

Can std::atomic memory barriers be used to transfer non-atomic data between threads?

阅读更多关于 Can std::atomic memory barriers be used to transfer non-atomic data between threads?

问题 Is the following code standards compliant? (or can it be made compliant without making x atomic or volatile ?) This is similar to an earlier question, however I would like a citation to the relevant section of the C++ standard, please. My concern is that atomic store() and load() do not provide sufficient compiler barriers for the non-atomic variables ( x in the example below) to have correct release and acquire semantics. My goal is to implement lock-free primitives, such as queues, that can

What does “store-buffer forwarding” mean in the Intel developer's manual?

阅读更多关于 What does “store-buffer forwarding” mean in the Intel developer's manual?

问题 The Intel 64 and IA-32 Architectures Software Developer's Manual says the following about re-ordering of actions by a single processor (Section 8.2.2, "Memory Ordering in P6 and More Recent Processor Families"): Reads may be reordered with older writes to different locations but not with older writes to the same location. Then below when discussing points where this is relaxed compared to earlier processors, it says: Store-buffer forwarding, when a read passes a write to the same memory

C++ standard: can relaxed atomic stores be lifted above a mutex lock?

阅读更多关于 C++ standard: can relaxed atomic stores be lifted above a mutex lock?

问题 Is there any wording in the standard that guarantees that relaxed stores to atomics won't be lifted above the locking of a mutex? If not, is there any wording that explicitly says that it's kosher for the compiler or CPU to do so? For example, take the following program: std::mutex mu; int foo = 0; // Guarded by mu std::atomic<bool> foo_has_been_set{false}; void SetFoo() { mu.lock(); foo = 1; foo_has_been_set.store(true, std::memory_order_relaxed); mu.unlock(); } void CheckFoo() { if (foo_has

Preventing of Out of Thin Air values with a memory barrier in C++

阅读更多关于 Preventing of Out of Thin Air values with a memory barrier in C++

问题 Let's consider the following two-thread concurrent program in C++: x,y are globals, r1,r2 are thread-local, store and load to int is atomic. Memory model = C++11 int x = 0, int y = 0 r1 = x | r2 = y y = r1 | x = r2 A compiler is allowed to compile it as: int x = 0, int y = 0 r1 = x | r2 = 42 y = r1 | x = r2 | if(y != 42) | x = r2 = y And, while it is intra-thread consistent, it can result in wild results, because it is possible that execution of that program results in (x, y) = (42, 42) It is

sequentially-consistent atomic load on x86

阅读更多关于 sequentially-consistent atomic load on x86

问题 I'm interested in sequentially-consistent load operation on x86. As far as I see from assembler listing, generated by compiler it is implemented as a plain load on x86, however plain loads as far as I know guaranteed to have acquire semantics, while plain stores are guaranteed to have release. Sequentially-consistent store is implemented as locked xchg, while load as plain load. That sounds strange to me, could you please explain this in details? added Just found in internet, that