memory-barriers

How to achieve a StoreLoad barrier in C++11?

谁说胖子不能爱 提交于 2020-06-08 04:52:27
问题 I want to write portable code (Intel, ARM, PowerPC...) which solves a variant of a classic problem: Initially: X=Y=0 Thread A: X=1 if(!Y){ do something } Thread B: Y=1 if(!X){ do something } in which the goal is to avoid a situation in which both threads are doing something . (It's fine if neither thing runs; this isn't a run-exactly-once mechanism.) Please correct me if you see some flaws in my reasoning below. I am aware, that I can achieve the goal with memory_order_seq_cst atomic store s

C11 Standalone memory barriers LoadLoad StoreStore LoadStore StoreLoad

左心房为你撑大大i 提交于 2020-05-15 08:23:24
问题 I want to use standalone memory barriers between atomic and non-atomic operations (I think it shouldn't matter at all anyway). I think I understand what a store barrier and a load barrier mean and also the 4 types of possible memory reorderings; LoadLoad , StoreStore , LoadStore , StoreLoad . However, I always find the acquire/release concepts confusing. Because when reading the documentation, acquire doesn't only speak about loads, but also stores, and release doesn't only speak about stores

Why is LOCK a full barrier on x86?

倖福魔咒の 提交于 2020-05-15 03:45:27
问题 Why does the LOCK prefix cause a full barrier on x86? (And thus it drains the store buffer and has sequential consistency) For LOCK /read-modify-write operations, a full barrier shouldn't be required and exclusive access to the cache line seems to be sufficient. Is it a design choice or is there some other limitation? 回答1: Long time ago, before the Intel 80486, Intel processors didn't have on-chip caches or write buffers. Therefore, by design, all writes become immediately globally visible in

Why does using MFENCE with store instruction block prefetching in L1 cache?

扶醉桌前 提交于 2020-03-18 04:46:11
问题 I have an object of 64 byte in size: typedef struct _object{ int value; char pad[60]; } object; in main I am initializing array of object: volatile object * array; int arr_size = 1000000; array = (object *) malloc(arr_size * sizeof(object)); for(int i=0; i < arr_size; i++){ array[i].value = 1; _mm_clflush(&array[i]); } _mm_mfence(); Then loop again through each element. This is the loop I am counting events for: int tmp; for(int i=0; i < arr_size-105; i++){ array[i].value = 2; //tmp = array[i

C11 Atomic Acquire/Release and x86_64 lack of load/store coherence?

时光怂恿深爱的人放手 提交于 2020-03-17 10:58:59
问题 I am struggling with Section 5.1.2.4 of the C11 Standard, in particular the semantics of Release/Acquire. I note that https://preshing.com/20120913/acquire-and-release-semantics/ (amongst others) states that: ... Release semantics prevent memory reordering of the write-release with any read or write operation that precedes it in program order. So, for the following: typedef struct test_struct { _Atomic(bool) ready ; int v1 ; int v2 ; } test_struct_t ; extern void test_init(test_struct_t* ts,

C11 Atomic Acquire/Release and x86_64 lack of load/store coherence?

£可爱£侵袭症+ 提交于 2020-03-17 10:58:47
问题 I am struggling with Section 5.1.2.4 of the C11 Standard, in particular the semantics of Release/Acquire. I note that https://preshing.com/20120913/acquire-and-release-semantics/ (amongst others) states that: ... Release semantics prevent memory reordering of the write-release with any read or write operation that precedes it in program order. So, for the following: typedef struct test_struct { _Atomic(bool) ready ; int v1 ; int v2 ; } test_struct_t ; extern void test_init(test_struct_t* ts,

C# SemaphoreSlim array elements read/write synchronization [closed]

醉酒当歌 提交于 2020-03-04 20:05:42
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 5 days ago . My blocking ring queue don't use lock or Mutex , only two SemaphoreSlim (block than 0 and than max element, so write and read part of array never intersects) and two int indexes modified by Interlocked.Decrement (not determine write/read index, but make it unique and correct move).

C# SemaphoreSlim array elements read/write synchronization [closed]

末鹿安然 提交于 2020-03-04 20:05:13
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 5 days ago . My blocking ring queue don't use lock or Mutex , only two SemaphoreSlim (block than 0 and than max element, so write and read part of array never intersects) and two int indexes modified by Interlocked.Decrement (not determine write/read index, but make it unique and correct move).

C# SemaphoreSlim array elements read/write synchronization [closed]

大兔子大兔子 提交于 2020-03-04 20:05:11
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 5 days ago . My blocking ring queue don't use lock or Mutex , only two SemaphoreSlim (block than 0 and than max element, so write and read part of array never intersects) and two int indexes modified by Interlocked.Decrement (not determine write/read index, but make it unique and correct move).

How to guarantee that load completes before store occurs?

此生再无相见时 提交于 2020-03-03 07:03:19
问题 In the following code, how could one ensure that ptr not incremented until after *ptr has been loaded/assigned/"extracted"? extern int arr[some_constexpr]; // assume pre-populated extern int* ptr; // assume points to non-atomic arr int a = *ptr; // want "memory barrier/fence" here ++ptr; Would an atomic pointer ensure the correct ordering/sequencing? #include <atomic> extern int arr[some_constexpr]; extern std::atomic<int*> ptr; int a = *(ptr.load()); // implicit "memory barrier" achieved