In my multithreaded application and I see heavy lock contention in it, preventing good scalability across multiple cores. I have decided to use lock free programming to solv
Writing thread-safe lock free code is hard; but this article from Herb Sutter will get you started.
Reduce or eliminate shared mutable state.
Can you clarify what you mean by structure?
Right now, I am assuming you mean the overall architecture. You can accomplish it by not sharing memory between processes, and by using an actor model for your processes.
Immutability is one approach to avoid locking. See Eric Lippert's discussion and implementation of things like immutable stacks and queues.
As my professor (Nir Shavit from "The Art of Multiprocessor Programming") told the class: Please don't. The main reason is testability - you can't test synchronization code. You can run simulations, you can even stress test. But it's rough approximation at best. What you really need is mathematical correctness proof. And very few capable understanding them, let alone writing them. So, as others had said: use existing libraries. Joe Duffy's blog surveys some techniques (section 28). The first one you should try is tree-splitting - break to smaller tasks and combine.
Take a look at my link ConcurrentLinkedHashMap for an example of how to write a lock-free data structure. It is not based on any academic papers and doesn't require years of research as others imply. It simply takes careful engineering.
My implementation does use a ConcurrentHashMap, which is a lock-per-bucket algorithm, but it does not rely on that implementation detail. It could easily be replaced with Cliff Click's lock-free implementation. I borrowed an idea from Cliff, but used much more explicitly, is to model all CAS operations with a state machine. This greatly simplifies the model, as you'll see that I have psuedo locks via the 'ing states. Another trick is to allow laziness and resolve as needed. You'll see this often with backtracking or letting other threads "help" to cleanup. In my case, I decided to allow dead nodes on the list be evicted when they reach the head, rather than deal with the complexity of removing them from the middle of the list. I may change that, but I didn't entirely trust my backtracking algorithm and wanted to put off a major change like adopting a 3-node locking approach.
The book "The Art of Multiprocessor Programming" is a great primer. Overall, though, I'd recommend avoiding lock-free designs in the application code. Often times it is simply overkill where other, less error prone, techniques are more suitable.