I\'m writing a multithreaded application in c++, where performance is critical. I need to use a lot of locking while copying small structures between threads, for this I hav
Just asking:
Before you dig that deep into spinlock and nearly-lockless data structures:
Have you - in your benchmarks and your application - made sure that the competing threads are guaranteed to run on different cores?
If not you may end up with a program that works great on your development machine but sucks/fails hard in the field because one thread has to be both the locker and unlocker of your spinlock.
To give you a figure: On Windows you have standard time-slice of 10 milliseconds. If you don't make sure that two physical threads are involved in locking/unlocking you'll end up with around 500 locks/unlocks per second, and this result will be very meh