Do spin locks always require a memory barrier? Is spinning on a memory barrier expensive?
I wrote some lock-free code that works fine with local reads, under most conditions. Does local spinning on a memory read necessarily imply I have to ALWAYS insert a memory barrier before the spinning read? (To validate this, I managed to produce a reader/writer combination which results in a reader never seeing the written value, under certain very specific conditions--dedicated CPU, process attached to CPU, optimizer turned all the way up, no other work done in the loop--so the arrows do point in that direction, but I'm not entirely sure about the cost of spinning through a memory barrier.)