memory-barriers

How does a mutex lock and unlock functions prevents CPU reordering?

狂风中的少年 提交于 2019-11-27 23:06:50
问题 As far as I know, a function call acts as a compiler barrier, but not as a CPU barrier. This tutorial says the following: acquiring a lock implies acquire semantics, while releasing a lock implies release semantics! All the memory operations in between are contained inside a nice little barrier sandwich, preventing any undesireable memory reordering across the boundaries. I assume that the above quote is talking about CPU reordering and not about compiler reordering. But I don't understand

Does an x86 CPU reorder instructions?

怎甘沉沦 提交于 2019-11-27 22:34:08
I have read that some CPUs reorder instructions, but this is not a problem for single threaded programs (the instructions would still be reordered in single threaded programs, but it would appear as if the instructions were executed in order), it is only a problem for multithreaded programs. To solve the problem of instructions reordering, we can insert memory barriers in the appropriate places in the code. But does an x86 CPU reorder instructions? If it does not, then there is no need to use memory barriers, right? Reordering Yes, all modern x86 chips from Intel and AMD aggressively reorder

How do JVM's implicit memory barriers behave when chaining constructors?

不打扰是莪最后的温柔 提交于 2019-11-27 21:41:30
问题 Referring to my earlier question on incompletely constructed objects, I have a second question. As Jon Skeet pointed out, there's an implicit memory barrier in the end of a constructor that makes sure that final fields are visible to all threads. But what if a constructor calls another constructor; is there such a memory barrier in the end of each of them, or only in the end of the one that got called in the first place? That is, when the "wrong" solution is: public class ThisEscape { public

Which is a better write barrier on x86: lock+addl or xchgl?

99封情书 提交于 2019-11-27 19:46:19
The Linux kernel uses lock; addl $0,0(%%esp) as write barrier, while the RE2 library uses xchgl (%0),%0 as write barrier. What's the difference and which is better? Does x86 also require read barrier instructions? RE2 defines its read barrier function as a no-op on x86 while Linux defines it as either lfence or no-op depending on whether SSE2 is available. When is lfence required? GJ. The " lock; addl $0,0(%%esp) " is faster in case that we testing the 0 state of lock variable at (%%esp) address. Because we add 0 value to lock variable and the zero flag is set to 1 if the lock value of

Does a memory barrier ensure that the cache coherence has been completed?

孤人 提交于 2019-11-27 18:51:52
Say I have two threads that manipulate the global variable x . Each thread (or each core I suppose) will have a cached copy of x . Now say that Thread A executes the following instructions: set x to 5 some other instruction Now when set x to 5 is executed, the cached value of x will be set to 5 , this will cause the cache coherence protocol to act and update the caches of the other cores with the new value of x . Now my question is: when x is actually set to 5 in Thread A 's cache, do the caches of the other cores get updated before some other instruction is executed? Or should a memory

C++ Memory Barriers for Atomics

ぃ、小莉子 提交于 2019-11-27 18:22:27
I'm a newbie when it comes to this. Could anyone provide a simplified explanation of the differences between the following memory barriers? The windows MemoryBarrier(); The fence _mm_mfence(); The inline assembly asm volatile ("" : : : "memory"); The intrinsic _ReadWriteBarrier(); If there isn't a simple explanation some links to good articles or books would probably help me get it straight. Until now I was fine with just using objects written by others wrapping these calls but I'd like to have a better understanding than my current thinking which is basically along the lines of there is more

Memory model ordering and visibility?

可紊 提交于 2019-11-27 17:21:06
I tried looking for details on this, I even read the standard on mutexes and atomics... but still I couldnt understand the C++11 memory model visibility guarantees. From what I understand the very important feature of mutex BESIDE mutual exclusion is ensuring visibility. Aka it is not enough that only one thread per time is increasing the counter, it is important that the thread increases the counter that was stored by the thread that was last using the mutex(I really dont know why people dont mention this more when discussing mutexes, maybe I had bad teachers :)). So from what I can tell

Does the semantics of `std::memory_order_acquire` requires processor instructions on x86/x86_64?

二次信任 提交于 2019-11-27 16:35:11
问题 It is known that on x86 for the operations load() and store() memory barriers memory_order_consume, memory_order_acquire, memory_order_release, memory_order_acq_rel does not require a processor instructions for the cache and pipeline, and assembler's code always corresponds to std::memory_order_relaxed , and these restrictions are necessary only for the optimization of the compiler: http://www.stdthread.co.uk/forum/index.php?topic=72.0 And this code Disassembly code confirms this for store()

Do spin locks always require a memory barrier? Is spinning on a memory barrier expensive?

大兔子大兔子 提交于 2019-11-27 15:16:19
问题 I wrote some lock-free code that works fine with local reads, under most conditions. Does local spinning on a memory read necessarily imply I have to ALWAYS insert a memory barrier before the spinning read? (To validate this, I managed to produce a reader/writer combination which results in a reader never seeing the written value, under certain very specific conditions--dedicated CPU, process attached to CPU, optimizer turned all the way up, no other work done in the loop--so the arrows do

If I don't use fences, how long could it take a core to see another core's writes?

淺唱寂寞╮ 提交于 2019-11-27 15:09:10
问题 I have been trying to Google my question but I honestly don't know how to succinctly state the question. Suppose I have two threads in a multi-core Intel system. These threads are running on the same NUMA node. Suppose thread 1 writes to X once, then only reads it occasionally moving forward. Suppose further that, among other things, thread 2 reads X continuously. If I don't use a memory fence, how long could it be between thread 1 writing X and thread 2 seeing the updated value? I understand