memory-barriers | 易学教程

Memory Barriers: a Hardware View for Software Hackers Example 3

阅读更多关于 Memory Barriers: a Hardware View for Software Hackers Example 3

I am copying the text for that figure from the original paper, Memory Barriers: a Hardware View for Software Hackers . Table 4 shows three code fragments, executed concurrently by CPUs 0, 1, and 2. All variables are initially zero. Note that neither CPU 1 nor CPU 2 can proceed to line 5 until they see CPU 0’s assignment to “b” on line 3. Once CPU 1 and 2 have executed their memory barriers on line 4, they are both guaranteed to see all assignments by CPU 0 preceding its memory barrier on line 2. Similarly, CPU 0’s memory barrier on line 8 pairs with those of CPUs 1 and 2 on line 4, so that CPU

Deep understanding of volatile in Java

阅读更多关于 Deep understanding of volatile in Java

Does Java allows output 1, 0 ? I've tested it very intensively and I cannot get that output. I get only 1, 1 or 0, 0 or 0, 1 . public class Main { private int x; private volatile int g; // Executed by thread #1 public void actor1(){ x = 1; g = 1; } // Executed by thread #2 public void actor2(){ put_on_screen_without_sync(g); put_on_screen_without_sync(x); } } Why? On my eye it is possible to get 1, 0 . My reasoning. g is volatile so it causes that memory order will be ensured. So, it looks like: actor1: (1) store(x, 1) (2) store(g, 1) (3) memory_barrier // on x86 and, I see the following

GCC reordering up across load with `memory_order_seq_cst`. Is this allowed?

阅读更多关于 GCC reordering up across load with `memory_order_seq_cst`. Is this allowed?

问题 Using a simplified version of a basic seqlock , gcc reorders a nonatomic load up across an atomic load(memory_order_seq_cst) when compiling the code with -O3 . This reordering isn't observed when compiling with other optimization levels or when compiling with clang ( even on O3 ). This reordering seems to violate a synchronizes-with relationship that should be established and I'm curious to know why gcc reorders this particular load and if this is even allowed by the standard. Consider this

Force order of execution of C statements?

阅读更多关于 Force order of execution of C statements?

问题 I have a problem with the MS C compiler reordering certain statements, critical in a multithreading context, at high levels of optimization. I want to know how to force ordering in specific places while still using high levels of optimization. (At low levels of optimization, this compiler does not reorder statements) The following code: ChunkT* plog2sizeChunk=... SET_BUSY(plog2sizeChunk->pPoolAndBusyFlag); // set "busy" bit on this chunk of storage x = plog2sizeChunk->pNext; produces this:

In OpenCL, what does mem_fence() do, as opposed to barrier()?

阅读更多关于 In OpenCL, what does mem_fence() do, as opposed to barrier()?

Unlike barrier() (which I think I understand), mem_fence() does not affect all items in the work group. The OpenCL spec says (section 6.11.10), for mem_fence() : Orders loads and stores of a work-item executing a kernel. (so it applies to a single work item). But, at the same time, in section 3.3.1, it says that: Within a work-item memory has load / store consistency. so within a work item the memory is consistent. So what kind of thing is mem_fence() useful for? It doesn't work across items, yet isn't needed within an item... Note that I haven't used atomic operations (section 9.5 etc). Is

difference between Memory Barriers and lock prefixed instruction

阅读更多关于 difference between Memory Barriers and lock prefixed instruction

In this article Memory Barriers and JVM Concurrency !, i was told volatile is implemented by different memory barriers instructions,while synchronized and atomic are implemented by lock prefixed instruction. But i get bellow code in some other article: java code: volatile Singleton instance = new Singleton(); assembly instruction(x86): 0x01a3de1d: movb $0x0,0x1104800(%esi); 0x01a3de24: lock addl $0x0,(%esp); So which one is right?And what is the difference between Memory Barriers and lock prefixed instruction without considering my poor english? Short answer Lock instructions are used to

Is memory barrier or atomic operation required in a busy-wait loop?

阅读更多关于 Is memory barrier or atomic operation required in a busy-wait loop?

问题 Consider the following spin_lock() implementation, originally from this answer: void spin_lock(volatile bool* lock) { for (;;) { // inserts an acquire memory barrier and a compiler barrier if (!__atomic_test_and_set(lock, __ATOMIC_ACQUIRE)) return; while (*lock) // no barriers; is it OK? cpu_relax(); } } What I already know: volatile prevents compiler from optimizing out *lock re-read on each iteration of the while loop; volatile inserts neither memory nor compiler barriers; such an

relaxed ordering as a signal

阅读更多关于 relaxed ordering as a signal

Let's say we have two thread. One that give a "go" and one that wait a go to produce something. Is this code correct or can I have an "infinite loop" because of cache or something like that? std::atomic_bool canGo{false}; void producer() { while(canGo.load(memory_order_relaxed) == false); produce_data(); } void launcher() { canGo.store(true, memory_order_relaxed); } int main() { thread a{producer}; thread b{launcher}; } If this code is not correct, is there a way to flush / invalidate the cache in standard c++? A go signal like this will usually be in response to some memory changes that you

std::memory_order_relaxed atomicity with respect to the same atomic variable

阅读更多关于 std::memory_order_relaxed atomicity with respect to the same atomic variable

The cppreference documentation about memory orders says Typical use for relaxed memory ordering is incrementing counters, such as the reference counters of std::shared_ptr, since this only requires atomicity, but not ordering or synchronization ( note that decrementing the shared_ptr counters requires acquire-release synchronization with the destructor ) Does this mean that relaxed memory ordering don't actually result in atomicity with respect to the same variable? But rather just results in eventual consistency with respect to other relaxed memory loads and/or compare_exchange s? Using std:

GCC reordering up across load with `memory_order_seq_cst`. Is this allowed?

阅读更多关于 GCC reordering up across load with `memory_order_seq_cst`. Is this allowed?

Using a simplified version of a basic seqlock , gcc reorders a nonatomic load up across an atomic load(memory_order_seq_cst) when compiling the code with -O3 . This reordering isn't observed when compiling with other optimization levels or when compiling with clang ( even on O3 ). This reordering seems to violate a synchronizes-with relationship that should be established and I'm curious to know why gcc reorders this particular load and if this is even allowed by the standard. Consider this following load function: auto load() { std::size_t copy; std::size_t seq0 = 0, seq1 = 0; do { seq0 = seq