memory-barriers

Does lock xchg have the same behavior as mfence?

杀马特。学长 韩版系。学妹 提交于 2019-11-27 02:09:27
What I'm wondering is if lock xchg will have similar behavior to mfence from the perspective of one thread accessing a memory location that is being mutated (lets just say at random) by other threads. Does it guarantee I get the most up to date value? Of memory read/write instructions that follow after? The reason for my confusion is: 8.2.2 “Reads or writes cannot be reordered with I/O instructions, locked instructions, or serializing instructions.” -Intel 64 Developers Manual Vol. 3 Does this apply across threads? mfence states: Performs a serializing operation on all load-from-memory and

Acquire/release semantics with 4 threads

泄露秘密 提交于 2019-11-27 01:50:20
问题 I am currently reading C++ Concurrency in Action by Anthony Williams. One of his listing shows this code, and he states that the assertion that z != 0 can fire. #include <atomic> #include <thread> #include <assert.h> std::atomic<bool> x,y; std::atomic<int> z; void write_x() { x.store(true,std::memory_order_release); } void write_y() { y.store(true,std::memory_order_release); } void read_x_then_y() { while(!x.load(std::memory_order_acquire)); if(y.load(std::memory_order_acquire)) ++z; } void

Why is (or isn't?) SFENCE + LFENCE equivalent to MFENCE?

陌路散爱 提交于 2019-11-27 01:47:12
As we know from a previous answer to Does it make any sense instruction LFENCE in processors x86/x86_64? that we can not use SFENCE instead of MFENCE for Sequential Consistency. An answer there suggests that MFENCE = SFENCE + LFENCE , i.e. that LFENCE does something without which we can not provide Sequential Consistency. LFENCE makes impossible to reordering: SFENCE LFENCE MOV reg, [addr] -- To --> MOV reg, [addr] SFENCE LFENCE For example reordering of MOV [addr], reg LFENCE --> LFENCE MOV [addr], reg provided by mechanism - Store Buffer , which reorders Store - Loads for performance

Memory Barrier by lock statement

那年仲夏 提交于 2019-11-27 00:39:53
问题 I read recently about memory barriers and the reordering issue and now I have some confusion about it. Consider the following scenario: private object _object1 = null; private object _object2 = null; private bool _usingObject1 = false; private object MyObject { get { if (_usingObject1) { return _object1; } else { return _object2; } } set { if (_usingObject1) { _object1 = value; } else { _object2 = value; } } } private void Update() { _usingMethod1 = true; SomeProperty = FooMethod(); //..

Atomicity on x86

左心房为你撑大大i 提交于 2019-11-27 00:28:45
8.1.2 Bus Locking Intel 64 and IA-32 processors provide a LOCK# signal that is asserted automatically during certain critical memory operations to lock the system bus or equivalent link. While this output signal is asserted, requests from other processors or bus agents for control of the bus are blocked. Software can specify other occasions when the LOCK semantics are to be followed by prepending the LOCK prefix to an instruction. It comes from Intel Manual, Volume 3 It sounds like the atomic operations on memory will be executed directly on memory (RAM). I am confused because I see "nothing

Is function call an effective memory barrier for modern platforms?

蓝咒 提交于 2019-11-27 00:05:30
问题 In a codebase I reviewed, I found the following idiom. void notify(struct actor_t act) { write(act.pipe, "M", 1); } // thread A sending data to thread B void send(byte *data) { global.data = data; notify(threadB); } // in thread B event loop read(this.sock, &cmd, 1); switch (cmd) { case 'M': use_data(global.data);break; ... } "Hold it", I said to the author, a senior member of my team, "there's no memory barrier here! You don't guarantee that global.data will be flushed from the cache to main

Is LFENCE serializing on AMD processors?

江枫思渺然 提交于 2019-11-26 23:16:25
In recent Intel ISA documents the lfence instruction has been defined as serializing the instruction stream (preventing out-of-order execution across it). In particular, the description of the instruction includes this line: Specifically, LFENCE does not execute until all prior instructions have completed locally, and no later instruction begins execution until LFENCE completes. Note that this applies to all instructions, not just memory load instructions, making lfence more than just a memory ordering fence. Although this now appears in the ISA documentation, it isn't clear if it is

Are mutex lock functions sufficient without volatile?

社会主义新天地 提交于 2019-11-26 22:59:56
问题 A coworker and I write software for a variety of platforms running on x86, x64, Itanium, PowerPC, and other 10 year old server CPUs. We just had a discussion about whether mutex functions such as pthread_mutex_lock() ... pthread_mutex_unlock() are sufficient by themselves, or whether the protected variable needs to be volatile. int foo::bar() { //... //code which may or may not access _protected. pthread_mutex_lock(m); int ret = _protected; pthread_mutex_unlock(m); return ret; } My concern is

Fastest inline-assembly spinlock

孤街醉人 提交于 2019-11-26 22:16:26
问题 I'm writing a multithreaded application in c++, where performance is critical. I need to use a lot of locking while copying small structures between threads, for this I have chosen to use spinlocks. I have done some research and speed testing on this and I found that most implementations are roughly equally fast: Microsofts CRITICAL_SECTION, with SpinCount set to 1000, scores about 140 time units Implementing this algorithm with Microsofts InterlockedCompareExchange scores about 95 time units

Dependent loads reordering in CPU

被刻印的时光 ゝ 提交于 2019-11-26 21:51:35
问题 I have been reading Memory Barriers: A Hardware View For Software Hackers, a very popular article by Paul E. McKenney. One of the things the paper highlights is that, very weakly ordered processors like Alpha, can reorder dependent loads which seems to be a side effect of partitioned cache Snippet from the paper: 1 struct el *insert(long key, long data) 2 { 3 struct el *p; 4 p = kmalloc(sizeof(*p), GPF_ATOMIC); 5 spin_lock(&mutex); 6 p->next = head.next; 7 p->key = key; 8 p->data = data; 9