In my multithreaded application and I see heavy lock contention in it, preventing good scalability across multiple cores. I have decided to use lock free programming to solv
Take a look at my link ConcurrentLinkedHashMap for an example of how to write a lock-free data structure. It is not based on any academic papers and doesn't require years of research as others imply. It simply takes careful engineering.
My implementation does use a ConcurrentHashMap, which is a lock-per-bucket algorithm, but it does not rely on that implementation detail. It could easily be replaced with Cliff Click's lock-free implementation. I borrowed an idea from Cliff, but used much more explicitly, is to model all CAS operations with a state machine. This greatly simplifies the model, as you'll see that I have psuedo locks via the 'ing states. Another trick is to allow laziness and resolve as needed. You'll see this often with backtracking or letting other threads "help" to cleanup. In my case, I decided to allow dead nodes on the list be evicted when they reach the head, rather than deal with the complexity of removing them from the middle of the list. I may change that, but I didn't entirely trust my backtracking algorithm and wanted to put off a major change like adopting a 3-node locking approach.
The book "The Art of Multiprocessor Programming" is a great primer. Overall, though, I'd recommend avoiding lock-free designs in the application code. Often times it is simply overkill where other, less error prone, techniques are more suitable.