memory-barriers

sequentially-consistent atomic load on x86

天大地大妈咪最大 提交于 2019-12-08 02:18:23
问题 I'm interested in sequentially-consistent load operation on x86. As far as I see from assembler listing, generated by compiler it is implemented as a plain load on x86, however plain loads as far as I know guaranteed to have acquire semantics, while plain stores are guaranteed to have release. Sequentially-consistent store is implemented as locked xchg, while load as plain load. That sounds strange to me, could you please explain this in details? added Just found in internet, that

Do concurrent interlocked and reads require a memory barrier or locking?

南笙酒味 提交于 2019-12-07 18:05:24
问题 This is a simple problem, but after reading Why do I need a memory barrier? I'm very confused about it. In the example below, assume different threads are repeatedly calling Increment and Counter: class Foo{ int _counter=0; public int Counter { get { return _counter; } } public void Increment() { Interlocked.Increment(ref _counter); } } Sorry if I'm misinterpreting Why do I need a memory barrier? but it seems like it's suggesting the class above might not be providing a freshness guarantee

Do I need to use volatile keyword for memory access in critical section?

若如初见. 提交于 2019-12-07 14:21:20
问题 I am writing code for a single processor 32 bit microcontroller using gcc. I need to consume time-stamped objects from a linked list. Another part of the code which could be asynchronous (maybe in an ISR) adds them to the list. The critical section is implemented by turning interrupts off and using the barrier() function. I'm confused where gcc optimization could break my code by cacheing pointers to the list items (next most recent item to remove, list head, or free list). I dont want

Memory Barriers: a Hardware View for Software Hackers Example 3

China☆狼群 提交于 2019-12-07 12:12:00
问题 I am copying the text for that figure from the original paper, Memory Barriers: a Hardware View for Software Hackers. Table 4 shows three code fragments, executed concurrently by CPUs 0, 1, and 2. All variables are initially zero. Note that neither CPU 1 nor CPU 2 can proceed to line 5 until they see CPU 0’s assignment to “b” on line 3. Once CPU 1 and 2 have executed their memory barriers on line 4, they are both guaranteed to see all assignments by CPU 0 preceding its memory barrier on line

How does the piggybacking of current thread variable in ReentrantLock.Sync work?

心已入冬 提交于 2019-12-06 22:32:32
问题 I read about some of the details of implementation of ReentrantLock in "Java Concurrency in Practice", section 14.6.1, and something in the annotation makes me confused: Because the protected state-manipulation methods have the memory semantics of a volatile read or write and ReentrantLock is careful to read the owner field only after calling getState and write it only before calling setState , ReentrantLock can piggyback on the memory semantics of the synchronization state, and thus avoid

difference between Memory Barriers and lock prefixed instruction

时间秒杀一切 提交于 2019-12-06 16:22:13
问题 In this article Memory Barriers and JVM Concurrency!, i was told volatile is implemented by different memory barriers instructions,while synchronized and atomic are implemented by lock prefixed instruction. But i get bellow code in some other article: java code: volatile Singleton instance = new Singleton(); assembly instruction(x86): 0x01a3de1d: movb $0x0,0x1104800(%esi); 0x01a3de24: lock addl $0x0,(%esp); So which one is right?And what is the difference between Memory Barriers and lock

Possible to use C11 fences to reason about writes from other threads?

天涯浪子 提交于 2019-12-06 14:22:58
Adve and Gharachorloo's report , in Figure 4b, provides the following example of a program that exhibits unexpected behavior in the absence of sequential consistency: My question is whether it is possible, using only C11 fences and memory_order_relaxed loads and stores, to ensure that register1, if written, will be written with the value 1. The reason this might be hard to guarantee in the abstract is that P1, P2, and P3 could be at different points in a pathological NUMA network with the property that P2 sees P1's write before P3 does, yet somehow P3 sees P2's write very quickly. The reason

The ordering of L1 cache controller to process memory requests from CPU

[亡魂溺海] 提交于 2019-12-06 13:17:15
Under the total store order(TSO) memory consistency model, a x86 cpu will have a write buffer to buffer write requests and can serve reordered read requests from the write buffer. And it says that the write requests in the write buffer will exit and be issued toward cache hierarchy in FIFO order, which is the same as program order. I am curious about: To serve the write requests issued from the write buffer, does L1 cache controller handle the write requests, finish the cache coherence of the write requests and insert data into L1 cache in the same order as the issue order? I think you're

relaxed ordering as a signal

北城以北 提交于 2019-12-06 05:13:35
问题 Let's say we have two thread. One that give a "go" and one that wait a go to produce something. Is this code correct or can I have an "infinite loop" because of cache or something like that? std::atomic_bool canGo{false}; void producer() { while(canGo.load(memory_order_relaxed) == false); produce_data(); } void launcher() { canGo.store(true, memory_order_relaxed); } int main() { thread a{producer}; thread b{launcher}; } If this code is not correct, is there a way to flush / invalidate the

C#/CLR: MemoryBarrier and torn reads

本秂侑毒 提交于 2019-12-06 03:33:25
问题 Just playing around with concurrency in my spare time, and wanted to try preventing torn reads without using locks on the reader side so concurrent readers don't interfere with each other. The idea is to serialize writes via a lock, but use only a memory barrier on the read side. Here's a reusable abstraction that encapsulate the approach I came up with: public struct Sync<T> where T : struct { object write; T value; int version; // incremented with each write public static Sync<T> Create() {