memory-barriers

Memory Barriers: a Hardware View for Software Hackers Example 3

自闭症网瘾萝莉.ら 提交于 2019-12-06 02:55:44
I am copying the text for that figure from the original paper, Memory Barriers: a Hardware View for Software Hackers . Table 4 shows three code fragments, executed concurrently by CPUs 0, 1, and 2. All variables are initially zero. Note that neither CPU 1 nor CPU 2 can proceed to line 5 until they see CPU 0’s assignment to “b” on line 3. Once CPU 1 and 2 have executed their memory barriers on line 4, they are both guaranteed to see all assignments by CPU 0 preceding its memory barrier on line 2. Similarly, CPU 0’s memory barrier on line 8 pairs with those of CPUs 1 and 2 on line 4, so that CPU

Deep understanding of volatile in Java

元气小坏坏 提交于 2019-12-05 20:28:24
Does Java allows output 1, 0 ? I've tested it very intensively and I cannot get that output. I get only 1, 1 or 0, 0 or 0, 1 . public class Main { private int x; private volatile int g; // Executed by thread #1 public void actor1(){ x = 1; g = 1; } // Executed by thread #2 public void actor2(){ put_on_screen_without_sync(g); put_on_screen_without_sync(x); } } Why? On my eye it is possible to get 1, 0 . My reasoning. g is volatile so it causes that memory order will be ensured. So, it looks like: actor1: (1) store(x, 1) (2) store(g, 1) (3) memory_barrier // on x86 and, I see the following

GCC reordering up across load with `memory_order_seq_cst`. Is this allowed?

半世苍凉 提交于 2019-12-05 20:09:11
问题 Using a simplified version of a basic seqlock , gcc reorders a nonatomic load up across an atomic load(memory_order_seq_cst) when compiling the code with -O3 . This reordering isn't observed when compiling with other optimization levels or when compiling with clang ( even on O3 ). This reordering seems to violate a synchronizes-with relationship that should be established and I'm curious to know why gcc reorders this particular load and if this is even allowed by the standard. Consider this

Force order of execution of C statements?

青春壹個敷衍的年華 提交于 2019-12-05 14:17:31
问题 I have a problem with the MS C compiler reordering certain statements, critical in a multithreading context, at high levels of optimization. I want to know how to force ordering in specific places while still using high levels of optimization. (At low levels of optimization, this compiler does not reorder statements) The following code: ChunkT* plog2sizeChunk=... SET_BUSY(plog2sizeChunk->pPoolAndBusyFlag); // set "busy" bit on this chunk of storage x = plog2sizeChunk->pNext; produces this:

In OpenCL, what does mem_fence() do, as opposed to barrier()?

邮差的信 提交于 2019-12-05 00:50:14
Unlike barrier() (which I think I understand), mem_fence() does not affect all items in the work group. The OpenCL spec says (section 6.11.10), for mem_fence() : Orders loads and stores of a work-item executing a kernel. (so it applies to a single work item). But, at the same time, in section 3.3.1, it says that: Within a work-item memory has load / store consistency. so within a work item the memory is consistent. So what kind of thing is mem_fence() useful for? It doesn't work across items, yet isn't needed within an item... Note that I haven't used atomic operations (section 9.5 etc). Is

difference between Memory Barriers and lock prefixed instruction

别等时光非礼了梦想. 提交于 2019-12-04 23:32:22
In this article Memory Barriers and JVM Concurrency !, i was told volatile is implemented by different memory barriers instructions,while synchronized and atomic are implemented by lock prefixed instruction. But i get bellow code in some other article: java code: volatile Singleton instance = new Singleton(); assembly instruction(x86): 0x01a3de1d: movb $0x0,0x1104800(%esi); 0x01a3de24: lock addl $0x0,(%esp); So which one is right?And what is the difference between Memory Barriers and lock prefixed instruction without considering my poor english? Short answer Lock instructions are used to

Is memory barrier or atomic operation required in a busy-wait loop?

▼魔方 西西 提交于 2019-12-04 16:37:21
问题 Consider the following spin_lock() implementation, originally from this answer: void spin_lock(volatile bool* lock) { for (;;) { // inserts an acquire memory barrier and a compiler barrier if (!__atomic_test_and_set(lock, __ATOMIC_ACQUIRE)) return; while (*lock) // no barriers; is it OK? cpu_relax(); } } What I already know: volatile prevents compiler from optimizing out *lock re-read on each iteration of the while loop; volatile inserts neither memory nor compiler barriers; such an

relaxed ordering as a signal

南笙酒味 提交于 2019-12-04 10:02:15
Let's say we have two thread. One that give a "go" and one that wait a go to produce something. Is this code correct or can I have an "infinite loop" because of cache or something like that? std::atomic_bool canGo{false}; void producer() { while(canGo.load(memory_order_relaxed) == false); produce_data(); } void launcher() { canGo.store(true, memory_order_relaxed); } int main() { thread a{producer}; thread b{launcher}; } If this code is not correct, is there a way to flush / invalidate the cache in standard c++? A go signal like this will usually be in response to some memory changes that you

std::memory_order_relaxed atomicity with respect to the same atomic variable

你。 提交于 2019-12-04 08:23:36
The cppreference documentation about memory orders says Typical use for relaxed memory ordering is incrementing counters, such as the reference counters of std::shared_ptr, since this only requires atomicity, but not ordering or synchronization ( note that decrementing the shared_ptr counters requires acquire-release synchronization with the destructor ) Does this mean that relaxed memory ordering don't actually result in atomicity with respect to the same variable? But rather just results in eventual consistency with respect to other relaxed memory loads and/or compare_exchange s? Using std:

GCC reordering up across load with `memory_order_seq_cst`. Is this allowed?

≯℡__Kan透↙ 提交于 2019-12-04 02:24:24
Using a simplified version of a basic seqlock , gcc reorders a nonatomic load up across an atomic load(memory_order_seq_cst) when compiling the code with -O3 . This reordering isn't observed when compiling with other optimization levels or when compiling with clang ( even on O3 ). This reordering seems to violate a synchronizes-with relationship that should be established and I'm curious to know why gcc reorders this particular load and if this is even allowed by the standard. Consider this following load function: auto load() { std::size_t copy; std::size_t seq0 = 0, seq1 = 0; do { seq0 = seq