I am working with CUDA on the windows platform. On the windows platform we have access to both Parallel Nsight and Visual Profiler. Both are pretty good but then they have
EDIT (change of mind): Based on reevaluating both NVIDIA Parallel Nsight and Visual Profiler, I now find NVIDIA Parallel Nsight much better for performance analysis.
The reasons are further explained by @Jeff Davis 's answer.