memory-barriers

How memory barrier/fence inhibit instruction reordering carried out by CPU?

送分小仙女□ 提交于 2019-12-11 00:56:37
问题 So far as I know,both compiler and CPU can carry out instruction reordering. By 'carried out by CPU',I mean that I do not care about instruction reordering that is done by compiler and reordering that is caused by Store Buffer and CPU cache. For reordering caused by the Store Buffer and CPU Cache,which was discussed in this paper, I have already understood how memory barrier inhibit such reordering(memory reordering). What I care is such a reordering: Source code: data=1; //statement1 ready

Compile time barriers - compiler code reordering - gcc and pthreads

巧了我就是萌 提交于 2019-12-10 23:54:47
问题 AFAIK there are pthread functions that acts as memory barriers (e.g. here clarifications-on-full-memory-barriers-involved-by-pthread-mutexes). But what about compile-time barrier, i.e. is compiler (especially gcc) aware of this? In other words - e.g. - is pthread_create() reason for gcc not to perform reordering? For example in code: a = 1; pthread_create(...); Is it certain that reordering will not take place? What about invocations from different functions: void fun(void) { pthread_create(.

How std::memory_order_seq_cst works

不问归期 提交于 2019-12-10 14:37:06
问题 I took the example about std::memory_order_seq_cst from: http://en.cppreference.com/w/cpp/atomic/memory_order #include <thread> #include <atomic> #include <cassert> std::atomic<bool> x = {false}; std::atomic<bool> y = {false}; std::atomic<int> z = {0}; void write_x() { x.store(true, std::memory_order_seq_cst); } void write_y() { y.store(true, std::memory_order_seq_cst); } void read_x_then_y() { while (!x.load(std::memory_order_seq_cst)) ; if (y.load(std::memory_order_seq_cst)) { ++z; } } void

Lock-free programming: reordering and memory order semantics

拥有回忆 提交于 2019-12-10 11:45:52
问题 I am trying to find my feet in lock-free programming. Having read different explanations for memory ordering semantics, I would like to clear up what possible reordering may happen. As far as I understood, instructions may be reordered by the compiler (due to optimization when the program is compiled) and CPU (at runtime?). For the relaxed semantics cpp reference provides the following example: // Thread 1: r1 = y.load(memory_order_relaxed); // A x.store(r1, memory_order_relaxed); // B //

Possible to use C11 fences to reason about writes from other threads?

无人久伴 提交于 2019-12-10 11:19:28
问题 Adve and Gharachorloo's report, in Figure 4b, provides the following example of a program that exhibits unexpected behavior in the absence of sequential consistency: My question is whether it is possible, using only C11 fences and memory_order_relaxed loads and stores, to ensure that register1, if written, will be written with the value 1. The reason this might be hard to guarantee in the abstract is that P1, P2, and P3 could be at different points in a pathological NUMA network with the

Volatile and Thread.MemoryBarrier in C#

岁酱吖の 提交于 2019-12-08 23:47:07
问题 To implement a lock free code for multithreading application I used volatile variables, Theoretically : The volatile keyword is simply used to make sure that all threads see the most updated value of a volatile variable; so if thread A updates the variable value and thread B read that variable just after that update is happened it will see the most updated value that written recently from thread A. As I read in a C# 4.0 in a Nutshell book that this is incorrect because applying volatile doesn

Using memory barriers to force in-order execution

泪湿孤枕 提交于 2019-12-08 20:45:44
问题 Trying to go on with my idea that using both software and hardware memory barriers I could disable the out-of-order optimization for a specific function inside a code that is compiled with compiler optimization, and therefore I could implement software semaphore using algorithms like Peterson or Deker that requires no out-of-order execution, I have tested the following code that contains both SW barrier asm volatile("": : :"memory") and gcc builtin HW barrier __sync_synchronize : #include

Which memory barrier does glGenerateMipmap require?

妖精的绣舞 提交于 2019-12-08 17:12:20
问题 I've written to the first mipmap level of a texture using GL_ARB_shader_image_load_store. The documentation states that I need to call glMemoryBarrier before I use the contents of this image in other operations, in order to flush the caches appropriately. For instance, before I do a glTexSubImage2D operation, I need to issue GL_TEXTURE_UPDATE_BARRIER_BIT​, and before I issue a draw call using a shader that samples that texture, I need to issue GL_TEXTURE_FETCH_BARRIER_BIT​. However, which

std::memory_order_relaxed and initialization

谁都会走 提交于 2019-12-08 10:50:48
问题 Is the following guaranteed to print 1 followed by 2? auto&& atomic = std::atomic<int>{0}; std::atomic<int>* pointer = nullptr; // thread 1 auto&& value = std::atomic<int>{1}; pointer = &value; atomic.store(1, std::memory_order_relaxed); while (atomic.load(std::memory_order_relaxed) != 2) {} cout << value.load(std::memory_order_relaxed) << endl; // thread 2 while (atomic.load(std::memory_order_relaxed) != 1) {} cout << pointer->load(std::memory_order_relaxed); << endl; pointer->fetch_add(1,

Preventing of Out of Thin Air values with a memory barrier in C++

北城余情 提交于 2019-12-08 06:42:57
问题 Let's consider the following two-thread concurrent program in C++: x,y are globals, r1,r2 are thread-local, store and load to int is atomic. Memory model = C++11 int x = 0, int y = 0 r1 = x | r2 = y y = r1 | x = r2 A compiler is allowed to compile it as: int x = 0, int y = 0 r1 = x | r2 = 42 y = r1 | x = r2 | if(y != 42) | x = r2 = y And, while it is intra-thread consistent, it can result in wild results, because it is possible that execution of that program results in (x, y) = (42, 42) It is