memory-barriers

Does `xchg` encompass `mfence` assuming no non-temporal instructions?

假如想象 提交于 2019-12-01 01:47:44
问题 I have already seen this answer and this answer, but neither appears to clear and explicit about the equivalence or non-equivalence of mfence and xchg under the assumption of no non-temporal instructions. The Intel instruction reference for xchg mentions that this instruction is useful for implementing semaphores or similar data structures for process synchronization , and further references Chapter 8 of Volume 3A. That reference states the following. For the P6 family processors, locked

std::memory_order_relaxed atomicity with respect to the same atomic variable

僤鯓⒐⒋嵵緔 提交于 2019-11-30 17:35:31
问题 The cppreference documentation about memory orders says Typical use for relaxed memory ordering is incrementing counters, such as the reference counters of std::shared_ptr, since this only requires atomicity, but not ordering or synchronization ( note that decrementing the shared_ptr counters requires acquire-release synchronization with the destructor ) Does this mean that relaxed memory ordering don't actually result in atomicity with respect to the same variable? But rather just results in

Volatile and Thread.MemoryBarrier in C#

最后都变了- 提交于 2019-11-30 13:01:52
To implement a lock free code for multithreading application I used volatile variables, Theoretically : The volatile keyword is simply used to make sure that all threads see the most updated value of a volatile variable; so if thread A updates the variable value and thread B read that variable just after that update is happened it will see the most updated value that written recently from thread A. As I read in a C# 4.0 in a Nutshell book that this is incorrect because applying volatile doesn’t prevent a write followed by a read from being swapped. Could this problem being solved by putting

Is a memory barrier an instruction that the CPU executes, or is it just a marker?

点点圈 提交于 2019-11-30 11:13:48
I am trying to understand what is a memory barrier exactly. Based on what I know so far, a memory barrier (for example: mfence ) is used to prevent the re-ordering of instructions from before to after and from after to before the memory barrier. This is an example of a memory barrier in use: instruction 1 instruction 2 instruction 3 mfence instruction 4 instruction 5 instruction 6 Now my question is: Is the mfence instruction just a marker telling the CPU in what order to execute the instructions? Or is it an instruction that the CPU actually executes like it executes other instructions (for

Memory barrier vs Interlocked impact on memory caches coherency timing

萝らか妹 提交于 2019-11-30 06:48:50
Simplified question: Is there a difference in timing of memory caches coherency (or "flushing") caused by Interlocked operations compared to Memory barriers? Let's consider in C# - any Interlocked operations vs Thread.MemoryBarrier(). I believe there is a difference. Background: I read quite few information about memory barriers - all the impact on prevention of specific types of memory interaction instructions reordering, but I couldn't find consistent info on whether they should cause immediate flushing of read/write queues. I actually found few sources mentioning that there is NO guarantee

Acquire/release semantics with 4 threads

无人久伴 提交于 2019-11-30 06:30:27
I am currently reading C++ Concurrency in Action by Anthony Williams. One of his listing shows this code, and he states that the assertion that z != 0 can fire. #include <atomic> #include <thread> #include <assert.h> std::atomic<bool> x,y; std::atomic<int> z; void write_x() { x.store(true,std::memory_order_release); } void write_y() { y.store(true,std::memory_order_release); } void read_x_then_y() { while(!x.load(std::memory_order_acquire)); if(y.load(std::memory_order_acquire)) ++z; } void read_y_then_x() { while(!y.load(std::memory_order_acquire)); if(x.load(std::memory_order_acquire)) ++z;

Which is a better write barrier on x86: lock+addl or xchgl?

孤街浪徒 提交于 2019-11-30 05:50:46
问题 The Linux kernel uses lock; addl $0,0(%%esp) as write barrier, while the RE2 library uses xchgl (%0),%0 as write barrier. What's the difference and which is better? Does x86 also require read barrier instructions? RE2 defines its read barrier function as a no-op on x86 while Linux defines it as either lfence or no-op depending on whether SSE2 is available. When is lfence required? 回答1: The " lock; addl $0,0(%%esp) " is faster in case that we testing the 0 state of lock variable at (%%esp)

What exact rules in the C++ memory model prevent reordering before acquire operations?

守給你的承諾、 提交于 2019-11-29 16:47:28
问题 I have a question regarding the order of operations in the following code: std::atomic<int> x; std::atomic<int> y; int r1; int r2; void thread1() { y.exchange(1, std::memory_order_acq_rel); r1 = x.load(std::memory_order_relaxed); } void thread2() { x.exchange(1, std::memory_order_acq_rel); r2 = y.load(std::memory_order_relaxed); } Given the description of std::memory_order_acquire on the cppreference page (https://en.cppreference.com/w/cpp/atomic/memory_order), that A load operation with this

why can MemoryBarrier be implemented as a call to xchg?

一世执手 提交于 2019-11-29 14:51:57
on msdn http://msdn.microsoft.com/en-us/library/windows/desktop/ms684208(v=vs.85).aspx , MemoryBarrier is implemented as a call to xchg. // x86 FORCEINLINE VOID MemoryBarrier ( VOID ) { LONG Barrier; __asm { xchg Barrier, eax } } I can't find some material in "Software Developer's Manual". please tell me the reason. From Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3: "System Programming Guide" 8.2.5 "Strengthening or Weakening the Memory-Ordering Model" Synchronization mechanisms in multiple-processor systems may depend upon a strong memory-ordering model. Here, a

Does x86-SSE-instructions have an automatic release-acquire order?

此生再无相见时 提交于 2019-11-29 13:51:33
As we know from from C11-memory_order: http://en.cppreference.com/w/c/atomic/memory_order And the same from C++11-std::memory_order: http://en.cppreference.com/w/cpp/atomic/memory_order On strongly-ordered systems ( x86 , SPARC, IBM mainframe), release-acquire ordering is automatic. No additional CPU instructions are issued for this synchronization mode , only certain compiler optimizations are affected (e.g. the compiler is prohibited from moving non-atomic stores past the atomic store-release or perform non-atomic loads earlier than the atomic load-acquire) But is this true for x86-SSE