The relative speed of the operations is largely a non-issue. What is relevant is the difference in scalability between lock-based and nonblocking algorithms. And if you're running on a 1 or 2 core system, stop thinking about such things.
Nonblocking algorithms generally scale better because they have shorter "critical sections" than lock-based algorithms.