I have read a lot of material on threading, and all the synchronization mechanisms involved. I also understand the dangers of not doing it properly.
I just watched th
When mutable shared data requires a lock, you could lose the benefits of parallelism.
Let's first make the simplification that context switches are free and locks are cheap (neither of which is exactly true - we will address these points at the end).
Think about the case of threads that share no data: threads can run independently without worrying about the state of other threads. Having two threads will make your algorithm run twice as fast.
Next, introduce some piece of shared data which changes over time. By definition, no two threads can be modifying/reading this data at the same time. That means if two threads happen to want access to this data, you no longer have concurrent operation: they must work in a serialized (synchronized) manner. The more frequently this contention happens, the more your application will behave like a single-threaded application than a dual-threaded app.
So when it is said that "locks are an expensive operation", I take it to mean that it is due to the potential loss of parallelism rather than the lock itself being expensive.
In addition to the loss of parallelism, if you accumulate the small, but non-zero costs of locks and potential synchronization and context switches, having locks can very likely slow down your algorithm.
Also note that the more threads you have trying to access that lock at the same time, the more your algorithm will appear to run serially rather than in parallel. The OS will also have to spend more cycles juggling all the contexts through that small straw created by the lock.
On the other hand, the drawbacks of having locks can be mitigated by not calling them often, not thrashing (locking/unlocking once rather than lock/unlock many times within a tight block), or by the use of pipelines or consumer/producer patterns (signaling with condition variables).
One trick for lockless operations includes doing all your shared data initialization before spawning any threads and only reading from that data after spawning.
One last comment: locks are needed to avoid race conditions on a shared resource. Contentions are just a consequence of having locks - it just means one thread can be blocked/waiting on a lock that another thread has locked. How often contentions happen actually depends on many factors: number of threads vs cores, how much time is spent in the lock, luck of the execution (dependent on the scheduling algorithm), the state of your OS during the run, etc...