I\'m writing a multithreaded application in c++, where performance is critical. I need to use a lot of locking while copying small structures between threads, for this I hav
Wikipedia has a good article on spinlocks, here is the x86 implementation
http://en.wikipedia.org/wiki/Spinlock#Example_implementation
Notice their implementation doesn't use the "lock" prefix, because it is redundant on x86 for the "xchg" instruction - it implicitly has lock semantics, as discussed in this Stackoverflow discussion:
On a multicore x86, is a LOCK necessary as a prefix to XCHG?
The REP:NOP is an alias for the PAUSE instruction, you can learn more about that here
How does x86 pause instruction work in spinlock *and* can it be used in other scenarios?
On the issue of memory barriers, here's everything you might want to know
Memory Barriers: a Hardware View for Software Hackers by Paul E. McKenney
http://irl.cs.ucla.edu/~yingdi/paperreading/whymb.2010.06.07c.pdf