memory-barriers | 易学教程

Does lock xchg have the same behavior as mfence?

阅读更多关于 Does lock xchg have the same behavior as mfence?

What I'm wondering is if lock xchg will have similar behavior to mfence from the perspective of one thread accessing a memory location that is being mutated (lets just say at random) by other threads. Does it guarantee I get the most up to date value? Of memory read/write instructions that follow after? The reason for my confusion is: 8.2.2 “Reads or writes cannot be reordered with I/O instructions, locked instructions, or serializing instructions.” -Intel 64 Developers Manual Vol. 3 Does this apply across threads? mfence states: Performs a serializing operation on all load-from-memory and

Acquire/release semantics with 4 threads

阅读更多关于 Acquire/release semantics with 4 threads

问题 I am currently reading C++ Concurrency in Action by Anthony Williams. One of his listing shows this code, and he states that the assertion that z != 0 can fire. #include <atomic> #include <thread> #include <assert.h> std::atomic<bool> x,y; std::atomic<int> z; void write_x() { x.store(true,std::memory_order_release); } void write_y() { y.store(true,std::memory_order_release); } void read_x_then_y() { while(!x.load(std::memory_order_acquire)); if(y.load(std::memory_order_acquire)) ++z; } void

Why is (or isn't?) SFENCE + LFENCE equivalent to MFENCE?

阅读更多关于 Why is (or isn't?) SFENCE + LFENCE equivalent to MFENCE?

As we know from a previous answer to Does it make any sense instruction LFENCE in processors x86/x86_64? that we can not use SFENCE instead of MFENCE for Sequential Consistency. An answer there suggests that MFENCE = SFENCE + LFENCE , i.e. that LFENCE does something without which we can not provide Sequential Consistency. LFENCE makes impossible to reordering: SFENCE LFENCE MOV reg, [addr] -- To --> MOV reg, [addr] SFENCE LFENCE For example reordering of MOV [addr], reg LFENCE --> LFENCE MOV [addr], reg provided by mechanism - Store Buffer , which reorders Store - Loads for performance

Memory Barrier by lock statement

阅读更多关于 Memory Barrier by lock statement

问题 I read recently about memory barriers and the reordering issue and now I have some confusion about it. Consider the following scenario: private object _object1 = null; private object _object2 = null; private bool _usingObject1 = false; private object MyObject { get { if (_usingObject1) { return _object1; } else { return _object2; } } set { if (_usingObject1) { _object1 = value; } else { _object2 = value; } } } private void Update() { _usingMethod1 = true; SomeProperty = FooMethod(); //..

Atomicity on x86

阅读更多关于 Atomicity on x86

8.1.2 Bus Locking Intel 64 and IA-32 processors provide a LOCK# signal that is asserted automatically during certain critical memory operations to lock the system bus or equivalent link. While this output signal is asserted, requests from other processors or bus agents for control of the bus are blocked. Software can specify other occasions when the LOCK semantics are to be followed by prepending the LOCK prefix to an instruction. It comes from Intel Manual, Volume 3 It sounds like the atomic operations on memory will be executed directly on memory (RAM). I am confused because I see "nothing

Is function call an effective memory barrier for modern platforms?

阅读更多关于 Is function call an effective memory barrier for modern platforms?

问题 In a codebase I reviewed, I found the following idiom. void notify(struct actor_t act) { write(act.pipe, "M", 1); } // thread A sending data to thread B void send(byte *data) { global.data = data; notify(threadB); } // in thread B event loop read(this.sock, &cmd, 1); switch (cmd) { case 'M': use_data(global.data);break; ... } "Hold it", I said to the author, a senior member of my team, "there's no memory barrier here! You don't guarantee that global.data will be flushed from the cache to main

Is LFENCE serializing on AMD processors?

阅读更多关于 Is LFENCE serializing on AMD processors?

In recent Intel ISA documents the lfence instruction has been defined as serializing the instruction stream (preventing out-of-order execution across it). In particular, the description of the instruction includes this line: Specifically, LFENCE does not execute until all prior instructions have completed locally, and no later instruction begins execution until LFENCE completes. Note that this applies to all instructions, not just memory load instructions, making lfence more than just a memory ordering fence. Although this now appears in the ISA documentation, it isn't clear if it is

Are mutex lock functions sufficient without volatile?

阅读更多关于 Are mutex lock functions sufficient without volatile?

问题 A coworker and I write software for a variety of platforms running on x86, x64, Itanium, PowerPC, and other 10 year old server CPUs. We just had a discussion about whether mutex functions such as pthread_mutex_lock() ... pthread_mutex_unlock() are sufficient by themselves, or whether the protected variable needs to be volatile. int foo::bar() { //... //code which may or may not access _protected. pthread_mutex_lock(m); int ret = _protected; pthread_mutex_unlock(m); return ret; } My concern is

Fastest inline-assembly spinlock

阅读更多关于 Fastest inline-assembly spinlock

问题 I'm writing a multithreaded application in c++, where performance is critical. I need to use a lot of locking while copying small structures between threads, for this I have chosen to use spinlocks. I have done some research and speed testing on this and I found that most implementations are roughly equally fast: Microsofts CRITICAL_SECTION, with SpinCount set to 1000, scores about 140 time units Implementing this algorithm with Microsofts InterlockedCompareExchange scores about 95 time units

Dependent loads reordering in CPU

阅读更多关于 Dependent loads reordering in CPU

问题 I have been reading Memory Barriers: A Hardware View For Software Hackers, a very popular article by Paul E. McKenney. One of the things the paper highlights is that, very weakly ordered processors like Alpha, can reorder dependent loads which seems to be a side effect of partitioned cache Snippet from the paper: 1 struct el *insert(long key, long data) 2 { 3 struct el *p; 4 p = kmalloc(sizeof(*p), GPF_ATOMIC); 5 spin_lock(&mutex); 6 p->next = head.next; 7 p->key = key; 8 p->data = data; 9