memory-barriers | 易学教程

Do spin locks always require a memory barrier? Is spinning on a memory barrier expensive?

阅读更多关于 Do spin locks always require a memory barrier? Is spinning on a memory barrier expensive?

I wrote some lock-free code that works fine with local reads, under most conditions. Does local spinning on a memory read necessarily imply I have to ALWAYS insert a memory barrier before the spinning read? (To validate this, I managed to produce a reader/writer combination which results in a reader never seeing the written value, under certain very specific conditions--dedicated CPU, process attached to CPU, optimizer turned all the way up, no other work done in the loop--so the arrows do point in that direction, but I'm not entirely sure about the cost of spinning through a memory barrier.)

Behavior of memory barrier in Java

阅读更多关于 Behavior of memory barrier in Java

问题 After reading more blogs/articles etc, I am now really confused about the behavior of load/store before/after memory barrier. Following are 2 quotes from Doug Lea in one of his clarification article about JMM, which are both very straighforward: Anything that was visible to thread A when it writes to volatile field f becomes visible to thread B when it reads f. Note that it is important for both threads to access the same volatile variable in order to properly set up the happens-before

Using time stamp counter and clock_gettime for cache miss

阅读更多关于 Using time stamp counter and clock_gettime for cache miss

As a follow-up to this topic , in order to calculate the memory miss latency, I have wrote the following code using _mm_clflush , __rdtsc and _mm_lfence (which is based on the code from this question/answer ). As you can see in the code, I first load the array into the cache. Then I flush one element and therefore the cache line is evicted from all cache levels. I put _mm_lfence in order to preserve the order during -O3 . Next, I used time stamp counter to calculate the latency or reading array[0] . As you can see between two time stamps, there are three instructions: two lfence and one read .

why can MemoryBarrier be implemented as a call to xchg?

阅读更多关于 why can MemoryBarrier be implemented as a call to xchg?

问题 on msdn http://msdn.microsoft.com/en-us/library/windows/desktop/ms684208(v=vs.85).aspx, MemoryBarrier is implemented as a call to xchg. // x86 FORCEINLINE VOID MemoryBarrier ( VOID ) { LONG Barrier; __asm { xchg Barrier, eax } } I can't find some material in "Software Developer's Manual". please tell me the reason. 回答1: From Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3: "System Programming Guide" 8.2.5 "Strengthening or Weakening the Memory-Ordering Model"

Are mutex lock functions sufficient without volatile?

阅读更多关于 Are mutex lock functions sufficient without volatile?

A coworker and I write software for a variety of platforms running on x86, x64, Itanium, PowerPC, and other 10 year old server CPUs. We just had a discussion about whether mutex functions such as pthread_mutex_lock() ... pthread_mutex_unlock() are sufficient by themselves, or whether the protected variable needs to be volatile. int foo::bar() { //... //code which may or may not access _protected. pthread_mutex_lock(m); int ret = _protected; pthread_mutex_unlock(m); return ret; } My concern is caching. Could the compiler place a copy of _protected on the stack or in a register, and use that

Does x86-SSE-instructions have an automatic release-acquire order?

阅读更多关于 Does x86-SSE-instructions have an automatic release-acquire order?

问题 As we know from from C11-memory_order: http://en.cppreference.com/w/c/atomic/memory_order And the same from C++11-std::memory_order: http://en.cppreference.com/w/cpp/atomic/memory_order On strongly-ordered systems ( x86 , SPARC, IBM mainframe), release-acquire ordering is automatic. No additional CPU instructions are issued for this synchronization mode , only certain compiler optimizations are affected (e.g. the compiler is prohibited from moving non-atomic stores past the atomic store

Memory Barrier by lock statement

阅读更多关于 Memory Barrier by lock statement

I read recently about memory barriers and the reordering issue and now I have some confusion about it. Consider the following scenario: private object _object1 = null; private object _object2 = null; private bool _usingObject1 = false; private object MyObject { get { if (_usingObject1) { return _object1; } else { return _object2; } } set { if (_usingObject1) { _object1 = value; } else { _object2 = value; } } } private void Update() { _usingMethod1 = true; SomeProperty = FooMethod(); //.. _usingMethod1 = false; } At Update method; is the _usingMethod1 = true statement always executed before

Is function call an effective memory barrier for modern platforms?

阅读更多关于 Is function call an effective memory barrier for modern platforms?

In a codebase I reviewed, I found the following idiom. void notify(struct actor_t act) { write(act.pipe, "M", 1); } // thread A sending data to thread B void send(byte *data) { global.data = data; notify(threadB); } // in thread B event loop read(this.sock, &cmd, 1); switch (cmd) { case 'M': use_data(global.data);break; ... } "Hold it", I said to the author, a senior member of my team, "there's no memory barrier here! You don't guarantee that global.data will be flushed from the cache to main memory. If thread A and thread B will run in two different processors - this scheme might fail". The

Dependent loads reordering in CPU

阅读更多关于 Dependent loads reordering in CPU

I have been reading Memory Barriers: A Hardware View For Software Hackers , a very popular article by Paul E. McKenney. One of the things the paper highlights is that, very weakly ordered processors like Alpha, can reorder dependent loads which seems to be a side effect of partitioned cache Snippet from the paper: 1 struct el *insert(long key, long data) 2 { 3 struct el *p; 4 p = kmalloc(sizeof(*p), GPF_ATOMIC); 5 spin_lock(&mutex); 6 p->next = head.next; 7 p->key = key; 8 p->data = data; 9 smp_wmb(); 10 head.next = p; 11 spin_unlock(&mutex); 12 } 13 14 struct el *search(long key) 15 { 16

Compiler reordering around mutex boundaries?

阅读更多关于 Compiler reordering around mutex boundaries?

问题 Suppose I have my own non-inline functions LockMutex and UnlockMutex, which are using some proper mutex - such as boost - inside. How will the compiler know not to reorder other operations with regard to calls to the LockMutex and UnlockMutex? It can not possibly know how will I implement these functions in some other compilation unit. void SomeClass::store(int i) { LockMutex(_m); _field = i; // could the compiler move this around? UnlockMutex(_m); } ps: One is supposed to use instances of