I am somewhat familiar with the CUDA visual profiler and the occupancy spreadsheet, although I am probably not leveraging them as well as I could. Profiling & optimizin
If you are using Windows... Check Nexus:
http://developer.nvidia.com/object/nexus.html