memory-barriers | 易学教程

Why flush the pipeline for Memory Order Violation caused by other logical processors?

阅读更多关于 Why flush the pipeline for Memory Order Violation caused by other logical processors?

问题 The Memory Order Machine Clear performance event is described by the vTune documentation as: The memory ordering (MO) machine clear happens when a snoop request from another processor matches a source for a data operation in the pipeline. In this situation the pipeline is cleared before the loads and stores in progress are retired. However I don't see why that should be the case. There is no synchronisation order between loads and stores on different logical processors. The processor could

VarHandle get/setOpaque

阅读更多关于 VarHandle get/setOpaque

问题 I keep fighting to understand what VarHandle::setOpaque and VarHandle::getOpaque are really doing. It has not been easy so far - there are some things I think I get (but will not present them in the question itself, not to muddy the waters), but overall this is miss-leading at best for me. The documentation: Returns the value of a variable, accessed in program order... Well in my understanding if I have: int xx = x; // read x int yy = y; // read y These reads can be re-ordered. On the other

What happens to expected memory semantics (such as read after write) when a thread is scheduled on a different CPU core?

阅读更多关于 What happens to expected memory semantics (such as read after write) when a thread is scheduled on a different CPU core?

问题 Code within a single thread has certain memory guarantees, such as read after write (i.e. writing some value to a memory location, then reading it back should give the value you wrote). What happens to such memory guarantees if a thread is rescheduled to execute on a different CPU core? Say a thread writes 10 to memory location X, then gets rescheduled to a different core. That core's L1 cache might have a different value for X (from another thread that was executing on that core previously),

StoreStore reordering happens when compiling C++ for x86

阅读更多关于 StoreStore reordering happens when compiling C++ for x86

问题 while(true) { int x(0), y(0); std::thread t0([&x, &y]() { x=1; y=3; }); std::thread t1([&x, &y]() { std::cout << "(" << y << ", " <<x <<")" << std::endl; }); t0.join(); t1.join(); } Firstly, I know that it is UB because of the data race. But, I expected only the following outputs: (3,1), (0,1), (0,0) I was convinced that it was not possible to get (3,0) , but I did. So I am confused- after all x86 doesn't allow StoreStore reordering. So x = 1 should be globally visible before y = 3 I suppose

Loads and stores reordering on ARM

阅读更多关于 Loads and stores reordering on ARM

问题 I'm not an ARM expert but won't those stores and loads be subjected to reordering at least on some ARM architectures? atomic<int> atomic_var; int nonAtomic_var; int nonAtomic_var2; void foo() { atomic_var.store(111, memory_order_relaxed); atomic_var.store(222, memory_order_relaxed); } void bar() { nonAtomic_var = atomic_var.load(memory_order_relaxed); nonAtomic_var2 = atomic_var.load(memory_order_relaxed); } I've had no success in making the compiler put memory barriers between them. I've

Thread safe usage of lock helpers (concerning memory barriers)

阅读更多关于 Thread safe usage of lock helpers (concerning memory barriers)

问题 By lock helpers I am referring to disposable objects with which locking can be implemented via using statements. For example, consider a typical usage of the SyncLock class from Jon Skeet's MiscUtil: public class Example { private readonly SyncLock _padlock; public Example() { _padlock = new SyncLock(); } public void ConcurrentMethod() { using (_padlock.Lock()) { // Now own the padlock - do concurrent stuff } } } Now, consider the following usage: var example = new Example(); new Thread

Can atomic ops based spin lock's Unlock directly set the lock flag to zero?

阅读更多关于 Can atomic ops based spin lock's Unlock directly set the lock flag to zero?

问题 Say for example, I have an exclusive atomic-ops-based spin lock implementation as below: bool TryLock(volatile TInt32 * pFlag) { return !(AtomicOps::Exchange32(pFlag, 1) == 1); } void Lock (volatile TInt32 * pFlag) { while (AtomicOps::Exchange32(pFlag, 1) == 1) { AtomicOps::ThreadYield(); } } void Unlock (volatile TInt32 * pFlag) { *pFlag = 0; // is this ok? or here as well a atomicity is needed for load and store } Where AtomicOps::Exchange32 is implemented on windows using

Can atomic ops based spin lock's Unlock directly set the lock flag to zero?

阅读更多关于 Can atomic ops based spin lock's Unlock directly set the lock flag to zero?

The ordering of L1 cache controller to process memory requests from CPU

阅读更多关于 The ordering of L1 cache controller to process memory requests from CPU

问题 Under the total store order(TSO) memory consistency model, a x86 cpu will have a write buffer to buffer write requests and can serve reordered read requests from the write buffer. And it says that the write requests in the write buffer will exit and be issued toward cache hierarchy in FIFO order, which is the same as program order. I am curious about: To serve the write requests issued from the write buffer, does L1 cache controller handle the write requests, finish the cache coherence of the

In OpenCL, what does mem_fence() do, as opposed to barrier()?

阅读更多关于 In OpenCL, what does mem_fence() do, as opposed to barrier()?

问题 Unlike barrier() (which I think I understand), mem_fence() does not affect all items in the work group. The OpenCL spec says (section 6.11.10), for mem_fence() : Orders loads and stores of a work-item executing a kernel. (so it applies to a single work item). But, at the same time, in section 3.3.1, it says that: Within a work-item memory has load / store consistency. so within a work item the memory is consistent. So what kind of thing is mem_fence() useful for? It doesn't work across items,