In my multithreaded application and I see heavy lock contention in it, preventing good scalability across multiple cores. I have decided to use lock free programming to solv
Use an existing implementation, as this area of work is the realm of domain experts and PhDs (if you want it done right!)
For example there is a library of code here:
http://www.cl.cam.ac.uk/research/srg/netos/lock-free/