memory-barriers

Determining the location for the usage of barriers (fences)

雨燕双飞 提交于 2019-11-29 12:06:21
The x86 instructions lfence/sfence/mfence are used to implement the rmb()/wmb()/mb() mechanisms in the Linux kernel. It is easy to understand that these are used to serialize the memory accesses. However, it is much more difficult to determine when and where to use these while writing the code -- before encountering the bug in the runtime behavior. I was interested to know if there are known caveats that could be checked, while writing/reviewing the code, that can help us determine where the barriers must be inserted. I understand this is a too complex, but is there a rule-of-thumb or a

Is a memory barrier an instruction that the CPU executes, or is it just a marker?

浪尽此生 提交于 2019-11-29 11:48:44
问题 I am trying to understand what is a memory barrier exactly. Based on what I know so far, a memory barrier (for example: mfence ) is used to prevent the re-ordering of instructions from before to after and from after to before the memory barrier. This is an example of a memory barrier in use: instruction 1 instruction 2 instruction 3 mfence instruction 4 instruction 5 instruction 6 Now my question is: Is the mfence instruction just a marker telling the CPU in what order to execute the

Do memory barriers guarantee a fresh read in C#?

落花浮王杯 提交于 2019-11-29 11:25:05
If we have the following code in C#: int a = 0; int b = 0; void A() // runs in thread A { a = 1; Thread.MemoryBarrier(); Console.WriteLine(b); } void B() // runs in thread B { b = 1; Thread.MemoryBarrier(); Console.WriteLine(a); } The MemoryBarriers make sure that the write instruction takes place before the read. However, is it guaranteed that the write of one thread is seen by the read on the other thread? In other words, is it guaranteed that at least one thread prints 1 or both thread could print 0 ? I know that several questions exist already that are relevant to "freshness" and

Memory barrier vs Interlocked impact on memory caches coherency timing

五迷三道 提交于 2019-11-29 07:03:12
问题 Simplified question: Is there a difference in timing of memory caches coherency (or "flushing") caused by Interlocked operations compared to Memory barriers? Let's consider in C# - any Interlocked operations vs Thread.MemoryBarrier(). I believe there is a difference. Background: I read quite few information about memory barriers - all the impact on prevention of specific types of memory interaction instructions reordering, but I couldn't find consistent info on whether they should cause

Compiler reordering around mutex boundaries?

自古美人都是妖i 提交于 2019-11-29 06:59:01
Suppose I have my own non-inline functions LockMutex and UnlockMutex, which are using some proper mutex - such as boost - inside. How will the compiler know not to reorder other operations with regard to calls to the LockMutex and UnlockMutex? It can not possibly know how will I implement these functions in some other compilation unit. void SomeClass::store(int i) { LockMutex(_m); _field = i; // could the compiler move this around? UnlockMutex(_m); } ps: One is supposed to use instances of classes for holding locks to guarantee unlocking. I have left this out to simplify the example. It can

How does a mutex lock and unlock functions prevents CPU reordering?

不问归期 提交于 2019-11-29 05:22:19
As far as I know, a function call acts as a compiler barrier, but not as a CPU barrier. This tutorial says the following: acquiring a lock implies acquire semantics, while releasing a lock implies release semantics! All the memory operations in between are contained inside a nice little barrier sandwich, preventing any undesireable memory reordering across the boundaries. I assume that the above quote is talking about CPU reordering and not about compiler reordering. But I don't understand how does a mutex lock and unlock causes the CPU to give these functions acquire and release semantics.

If I don't use fences, how long could it take a core to see another core's writes?

岁酱吖の 提交于 2019-11-29 04:23:44
I have been trying to Google my question but I honestly don't know how to succinctly state the question. Suppose I have two threads in a multi-core Intel system. These threads are running on the same NUMA node. Suppose thread 1 writes to X once, then only reads it occasionally moving forward. Suppose further that, among other things, thread 2 reads X continuously. If I don't use a memory fence, how long could it be between thread 1 writing X and thread 2 seeing the updated value? I understand that the write of X will go to the store buffer and from there to the cache, at which point MESIF will

Are memory-barriers required when joining on a thread?

梦想的初衷 提交于 2019-11-29 03:55:52
问题 If a thread A spawns another thread B with the single purpose of writing to a variable V and then waits for it to terminate, are memory-barriers required to ensure that subsequent reads of V on thread A are fresh? I'm unsure if there any implicit barriers in the termination / joining operations that make them redundant. Here's an example: public static T ExecuteWithCustomStackSize<T> (Func<T> func, int stackSize) { T result = default(T); var thread = new Thread( () => { result = func();

Does the semantics of `std::memory_order_acquire` requires processor instructions on x86/x86_64?

孤街浪徒 提交于 2019-11-29 02:23:41
It is known that on x86 for the operations load() and store() memory barriers memory_order_consume, memory_order_acquire, memory_order_release, memory_order_acq_rel does not require a processor instructions for the cache and pipeline, and assembler's code always corresponds to std::memory_order_relaxed , and these restrictions are necessary only for the optimization of the compiler: http://www.stdthread.co.uk/forum/index.php?topic=72.0 And this code Disassembly code confirms this for store() (MSVS2012 x86_64 ): std::atomic<int> a; a.store(0, std::memory_order_relaxed); 000000013F931A0D mov

How do JVM's implicit memory barriers behave when chaining constructors?

 ̄綄美尐妖づ 提交于 2019-11-29 01:30:51
Referring to my earlier question on incompletely constructed objects , I have a second question. As Jon Skeet pointed out, there's an implicit memory barrier in the end of a constructor that makes sure that final fields are visible to all threads. But what if a constructor calls another constructor; is there such a memory barrier in the end of each of them, or only in the end of the one that got called in the first place? That is, when the "wrong" solution is: public class ThisEscape { public ThisEscape(EventSource source) { source.registerListener( new EventListener() { public void onEvent