I am soon going to be tasked with doing a proper memory profile of a code that is written in C/C++ and uses CUDA to take advantage of GPU processing.
My initial thou
You could use the profiler included in Visual Studio 2010 Premium and Ultimate.
It lets you choose between different methods of performance measuring, the most useful for you will probably be CPU sampling because it freezes your program at arbitrary time intervals and figures out which functions it is currently executing, thereby not making your program run substantially slower.