lock-free | 易学教程

Atomic operations for lock-free doubly linked list

阅读更多关于 Atomic operations for lock-free doubly linked list

问题 I am writing a lock-free doubly linked list based on these papers: "Efficient and Reliable Lock-Free Memory Reclamation Based on Reference Counting" Anders Gidenstam,Member, IEEE,Marina Papatriantafilou, H˚ akan Sundell and Philippas Tsigas "Lock-free deques and doubly linked lists" Håkan Sundell, Philippas Tsigas For this question we can put aside first paper. In this paper, they use a smart way for storing a deletion flag and a pointer in a word. (More info here) Pseudo code for this

C++ atomic operations for lock-free structures

阅读更多关于 C++ atomic operations for lock-free structures

问题 I'm implementing a lock-free mechanism using atomic (double) compare and swap instructions e.g. cmpxchg16b I'm currently writing this in assembly and then linking it in. However, I wondered if there was a way of getting the compiler to do this for me automatically? e.g. surround code block with 'atomically' and have it go figure it out how to implement the code as an atomic instruction in the underlying processor architecture (or generate an error at compile time if the underlying arch does

Acquire/release semantics with non-temporal stores on x64

阅读更多关于 Acquire/release semantics with non-temporal stores on x64

I have something like: if (f = acquire_load() == ) { ... use Foo } and: auto f = new Foo(); release_store(f) You could easily imagine an implementation of acquire_load and release_store that uses atomic with load(memory_order_acquire) and store(memory_order_release). But now what if release_store is implemented with _mm_stream_si64, a non-temporal write, which is not ordered with respect to other stores on x64? How to get the same semantics? I think the following is the minimum required: atomic<Foo*> gFoo; Foo* acquire_load() { return gFoo.load(memory_order_relaxed); } void release_store(Foo*

Dependent loads reordering in CPU

阅读更多关于 Dependent loads reordering in CPU

问题 I have been reading Memory Barriers: A Hardware View For Software Hackers, a very popular article by Paul E. McKenney. One of the things the paper highlights is that, very weakly ordered processors like Alpha, can reorder dependent loads which seems to be a side effect of partitioned cache Snippet from the paper: 1 struct el *insert(long key, long data) 2 { 3 struct el *p; 4 p = kmalloc(sizeof(*p), GPF_ATOMIC); 5 spin_lock(&mutex); 6 p->next = head.next; 7 p->key = key; 8 p->data = data; 9

Is there any compiler barrier which is equal to asm(“” ::: “memory”) in C++11?

阅读更多关于 Is there any compiler barrier which is equal to asm(“” ::: “memory”) in C++11?

问题 My test code is as below, and I found that only the memory_order_seq_cst forbade compiler's reorder. #include <atomic> using namespace std; int A, B = 1; void func(void) { A = B + 1; atomic_thread_fence(memory_order_seq_cst); B = 0; } And other choices such as memory_order_release , memory_order_acq_rel did not generate any compiler barrier at all. I think they must work with atomic variable just as below. #include <atomic> using namespace std; atomic<int> A(0); int B = 1; void func(void) { A

Is there a production ready lock-free queue or hash implementation in C++ [closed]

阅读更多关于 Is there a production ready lock-free queue or hash implementation in C++ [closed]

I ve been googling quite a bit for a lock-free queue in C++. I found some code and some trials - but nothing that i was able to compile. A lock-free hash would also be welcome. SUMMARY: So far i have no positive answer. There is no "production ready" library, and amazingly none of the existent libraries complies to the API of STL containers. As of 1.53, boost provides a set of lock free data structures , including queues, stacks and single-producer/single-consumer queues (i.e. ring buffers). Steve Gilham The starting point would be either of Herb Sutter's DDJ articles for either a single

Shared-memory IPC synchronization (lock-free)

阅读更多关于 Shared-memory IPC synchronization (lock-free)

Consider the following scenario: Requirements: Intel x64 Server (multiple CPU-sockets => NUMA) Ubuntu 12, GCC 4.6 Two processes sharing large amounts of data over (named) shared-memory Classical producer-consumer scenario Memory is arranged in a circular buffer (with M elements) Program sequence (pseudo code): Process A (Producer): int bufferPos = 0; while( true ) { if( isBufferEmpty( bufferPos ) ) { writeData( bufferPos ); setBufferFull( bufferPos ); bufferPos = ( bufferPos + 1 ) % M; } } Process B (Consumer): int bufferPos = 0; while( true ) { if( isBufferFull( bufferPos ) ) { readData(

Lock-free Progress Guarantees

阅读更多关于 Lock-free Progress Guarantees

Anecdotally, I've found that a lot of programmers mistakenly believe that "lock-free" simply means "concurrent programming without mutexes". Usually, there's also a correlated misunderstanding that the purpose of writing lock-free code is for better concurrent performance. Of course, the correct definition of lock-free is actually about progress guarantees . A lock-free algorithm guarantees that at least one thread is able to make forward progress regardless of what any other threads are doing. This means a lock-free algorithm can never have code where one thread is depending on another thread

Portable Compare And Swap (atomic operations) C/C++ library?

阅读更多关于 Portable Compare And Swap (atomic operations) C/C++ library?

问题 Is there any small library, that wrapps various processors' CAS-like operations into macros or functions, that are portable across multiple compilers? PS. The atomic.hpp library is inside boost::interprocess::detail namespace. The author refuses to make it a public, well maintained library. Lets reopen the question, and see if there are any other options? 回答1: Intel Threading Building Blocks has a nice portable atomic<T> template which does what you want. But whether it is a small library or

atomic operation cost

阅读更多关于 atomic operation cost

问题 What is the cost of the atomic operation (any of compare-and-swap or atomic add/decrement)? How much cycles does it consume? Will it pause other processors on SMP or NUMA, or will it block memory accesses? Will it flush reorder buffer in out-of-order CPU? What effects will be on the cache? I'm interested in modern, popular CPUs: x86, x86_64, PowerPC, SPARC, Itanium. 回答1: I have looked for actual data for the past days, and found nothing. However, I did some research, which compares the cost