Does anyone know of published benchmarks of the overhead of locking instead of relying on certainly atomic operations/intrinsics (on a multiprocessor system) only?
I
Answers to this sort of question are highly complex: lock-free code frequently spins rather than waiting under contention, and so under high contention may run significantly slower than it would if the threads just blocked themselves in a queue behind a lock. In addition, lock-free code can still spend a lot of time issuing memory barriers, and so might have unexpected performance characteristics on different hardware and on different platforms.
So, the old standby answer to performance questions rears its head again: measure both of them in your situation, and pick the faster one.
Basically, I need this to justify a complex lock-free implementation instead of a straightforward locking one where starvation isn’t an issue.
Well, all that said, unless this data structure is at the absolute center of your massive-scale multithreaded application, I would go with a lock-based solution, with as few locks as possible. The bugs will be so much easier to find that it will be worth it. Where threading is involved, I personally find it very difficult to justify any kind of complex implementation, lock-free or otherwise.