Raymond Chen has been doing a huge series on lockfree algorithms. Beyond the simple cases of the InterlockedXxx functions, it seems like the prevailing pattern
Lock-free isn't necessarily any faster, but it can eliminate the possibility of deadlock or livelock, so you can guarantee that your program will always make progress toward finishing. With locks, it's difficult to make any such guarantee -- it's all too easy to miss some possible execution sequence that results in a deadlock.
Past that, it all depends. At least in my experience, differences in speed tend to depend more on the skill level deployed in the implementation than whether it uses locks or not.