Raymond Chen has been doing a huge series on lockfree algorithms. Beyond the simple cases of the InterlockedXxx
functions, it seems like the prevailing pattern
Lock-free algorithms can absolutely be faster then their blocking counterpart. But of course the inverse is true as well. Assuming the implementation performs better then the locking counter part, the only limiting factor is contention.
Take the two Java classes, ConcurrentLinkedQueue and LinkedBlockingQueue. Under moderate real world contention the CLQ outperforms the LBQ by a good amount. With heavy contention the use of suspending threads will allow the LBQ to perform better.
I disagree with user237815. synchronized keyword doesn't require as much overhead as it once did, but relative to a lock-free algorithm it does have a good amount of overhead associated to it compared to a single CAS.