I\'m playing with the std::atomic structures and wrote this lock-free multi-producer multi-consumer queue, which I\'m attaching here. The idea for the queue is based on two
I believe I was able to crack this one. No livelock at 1000000 writes/reads for queues from size 2 to 1024 and from 1 producer and 1 consumer to 100 producers / 100 consumers.
Here's the solution. The trick is not to use cell->m_next directly in the compare and swap (the same applies for the producer code by the way) and to require strict memory order rules:
This seems to confirm my suspicion that it was compiler reordering of the reads writes. Here's the code:
bool push(const TData& data)
{
CellNode* cell = m_produceHead.load(std::memory_order_acquire);
if(cell == NULL)
return false;
while(!std::atomic_compare_exchange_strong_explicit(&m_produceHead,
&cell,
cell->m_next,
std::memory_order_acquire,
std::memory_order_release))
{
if(!cell)
return false;
}
m_data[cell->m_idx] = data;
CellNode* curHead = m_consumeHead;
cell->m_next = curHead;
while (!std::atomic_compare_exchange_strong_explicit(&m_consumeHead,
&curHead,
cell,
std::memory_order_acquire,
std::memory_order_release))
{
cell->m_next = curHead;
}
return true;
}
bool pop(TData& data)
{
CellNode* cell = m_consumeHead.load(std::memory_order_acquire);
if(cell == NULL)
return false;
while(!std::atomic_compare_exchange_strong_explicit(&m_consumeHead,
&cell,
cell->m_next,
std::memory_order_acquire,
std::memory_order_release))
{
if(!cell)
return false;
}
data = m_data[cell->m_idx];
CellNode* curHead = m_produceHead;
cell->m_next = curHead;
while(!std::atomic_compare_exchange_strong_explicit(&m_produceHead,
&curHead,
cell,
std::memory_order_acquire,
std::memory_order_release))
{
cell->m_next = curHead;
}
return true;
};
I see a few problems with your queue implementation:
It's not a queue, it's a stack: the most recent item pushed is the first item popped. Not that there's anything wrong with stacks, but it's confusing to call it a queue. In fact it is two lock-free stacks: one stack that is initially populated with the array of nodes, and another stack that stores actual data elements using the first stack as a list of free nodes.
There is a data race on CellNode::m_next
in both push
and pop
(unsurprisingly, since they both do the same thing, i.e., pop a node from one stack and push that node onto the other). Say two threads simultaneously enter e.g. pop
and both read the same value from m_consumeHead
. Thread 1 races ahead successfully popping and sets data
. Then Thread 1 writes the value of m_produceHead
into cell->m_next
while Thread 2 is simultaneously reading cell->m_next
to pass to std::atomic_compare_exchange_strong_explicit
. The simultaneous non-atomic read and write of cell->m_next
by two threads is by definition a data race.
This is what is known as a "benign" race in the concurrency literature: a stale/invalid value is read, but never gets used. If you are confident that your code will never need to run on an architecture where it could cause fiery explosions you may ignore it, but for strict conformance with the Standard memory model you need to make m_next
an atomic and use at least memory_order_relaxed
reads to eliminate the data race.
ABA. The correctness of your compare-exchange loops is based on the premise that an atomic pointer (e.g., m_produceHead
and m_consumeHead
) having the same value at both the initial load and the later compare-exchange implies that the pointee object must therefore be unchanged as well. This premise does not hold in any design in which it is possible to recycle an object faster than some thread makes a trip through its compare-exchange loop. Consider this sequence of events:
pop
and reads the value of m_consumeHead
and m_consumeHead->m_next
but blocks before calling the compare-exchange.m_consumeHead
and blocks as well.m_consumeHead
.m_produceHead
.m_produceHead
, and pushes it back onto m_consumeHead
.m_consumeHead
is the same. It pops the node - which is all well and good - but sets m_consumeHead
to the stale m_next
value it read back in step 1. All the nodes pushed by Thread 3 in the meantime are leaked.