memory-barriers | 易学教程

How memory barrier/fence inhibit instruction reordering carried out by CPU?

阅读更多关于 How memory barrier/fence inhibit instruction reordering carried out by CPU?

问题 So far as I know,both compiler and CPU can carry out instruction reordering. By 'carried out by CPU',I mean that I do not care about instruction reordering that is done by compiler and reordering that is caused by Store Buffer and CPU cache. For reordering caused by the Store Buffer and CPU Cache,which was discussed in this paper, I have already understood how memory barrier inhibit such reordering(memory reordering). What I care is such a reordering: Source code: data=1; //statement1 ready

Compile time barriers - compiler code reordering - gcc and pthreads

阅读更多关于 Compile time barriers - compiler code reordering - gcc and pthreads

问题 AFAIK there are pthread functions that acts as memory barriers (e.g. here clarifications-on-full-memory-barriers-involved-by-pthread-mutexes). But what about compile-time barrier, i.e. is compiler (especially gcc) aware of this? In other words - e.g. - is pthread_create() reason for gcc not to perform reordering? For example in code: a = 1; pthread_create(...); Is it certain that reordering will not take place? What about invocations from different functions: void fun(void) { pthread_create(.

How std::memory_order_seq_cst works

阅读更多关于 How std::memory_order_seq_cst works

问题 I took the example about std::memory_order_seq_cst from: http://en.cppreference.com/w/cpp/atomic/memory_order #include <thread> #include <atomic> #include <cassert> std::atomic<bool> x = {false}; std::atomic<bool> y = {false}; std::atomic<int> z = {0}; void write_x() { x.store(true, std::memory_order_seq_cst); } void write_y() { y.store(true, std::memory_order_seq_cst); } void read_x_then_y() { while (!x.load(std::memory_order_seq_cst)) ; if (y.load(std::memory_order_seq_cst)) { ++z; } } void

Lock-free programming: reordering and memory order semantics

阅读更多关于 Lock-free programming: reordering and memory order semantics

问题 I am trying to find my feet in lock-free programming. Having read different explanations for memory ordering semantics, I would like to clear up what possible reordering may happen. As far as I understood, instructions may be reordered by the compiler (due to optimization when the program is compiled) and CPU (at runtime?). For the relaxed semantics cpp reference provides the following example: // Thread 1: r1 = y.load(memory_order_relaxed); // A x.store(r1, memory_order_relaxed); // B //

Possible to use C11 fences to reason about writes from other threads?

阅读更多关于 Possible to use C11 fences to reason about writes from other threads?

问题 Adve and Gharachorloo's report, in Figure 4b, provides the following example of a program that exhibits unexpected behavior in the absence of sequential consistency: My question is whether it is possible, using only C11 fences and memory_order_relaxed loads and stores, to ensure that register1, if written, will be written with the value 1. The reason this might be hard to guarantee in the abstract is that P1, P2, and P3 could be at different points in a pathological NUMA network with the

Volatile and Thread.MemoryBarrier in C#

阅读更多关于 Volatile and Thread.MemoryBarrier in C#

问题 To implement a lock free code for multithreading application I used volatile variables, Theoretically : The volatile keyword is simply used to make sure that all threads see the most updated value of a volatile variable; so if thread A updates the variable value and thread B read that variable just after that update is happened it will see the most updated value that written recently from thread A. As I read in a C# 4.0 in a Nutshell book that this is incorrect because applying volatile doesn

Using memory barriers to force in-order execution

阅读更多关于 Using memory barriers to force in-order execution

问题 Trying to go on with my idea that using both software and hardware memory barriers I could disable the out-of-order optimization for a specific function inside a code that is compiled with compiler optimization, and therefore I could implement software semaphore using algorithms like Peterson or Deker that requires no out-of-order execution, I have tested the following code that contains both SW barrier asm volatile("": : :"memory") and gcc builtin HW barrier __sync_synchronize : #include

Which memory barrier does glGenerateMipmap require?

阅读更多关于 Which memory barrier does glGenerateMipmap require?

问题 I've written to the first mipmap level of a texture using GL_ARB_shader_image_load_store. The documentation states that I need to call glMemoryBarrier before I use the contents of this image in other operations, in order to flush the caches appropriately. For instance, before I do a glTexSubImage2D operation, I need to issue GL_TEXTURE_UPDATE_BARRIER_BIT, and before I issue a draw call using a shader that samples that texture, I need to issue GL_TEXTURE_FETCH_BARRIER_BIT. However, which

std::memory_order_relaxed and initialization

阅读更多关于 std::memory_order_relaxed and initialization

问题 Is the following guaranteed to print 1 followed by 2? auto&& atomic = std::atomic<int>{0}; std::atomic<int>* pointer = nullptr; // thread 1 auto&& value = std::atomic<int>{1}; pointer = &value; atomic.store(1, std::memory_order_relaxed); while (atomic.load(std::memory_order_relaxed) != 2) {} cout << value.load(std::memory_order_relaxed) << endl; // thread 2 while (atomic.load(std::memory_order_relaxed) != 1) {} cout << pointer->load(std::memory_order_relaxed); << endl; pointer->fetch_add(1,

Preventing of Out of Thin Air values with a memory barrier in C++

阅读更多关于 Preventing of Out of Thin Air values with a memory barrier in C++

问题 Let's consider the following two-thread concurrent program in C++: x,y are globals, r1,r2 are thread-local, store and load to int is atomic. Memory model = C++11 int x = 0, int y = 0 r1 = x | r2 = y y = r1 | x = r2 A compiler is allowed to compile it as: int x = 0, int y = 0 r1 = x | r2 = 42 y = r1 | x = r2 | if(y != 42) | x = r2 = y And, while it is intra-thread consistent, it can result in wild results, because it is possible that execution of that program results in (x, y) = (42, 42) It is