I had a quick look on the forums and I don\'t think this question has been asked already.
I am currently working with an MPI/CUDA hybrid code, made by somebody else
Things have changed since CUDA 5.0 and now we can simply use %h, %p and %q{ENV} as mentioned here instead of using a wrapper script:
%h
%p
%q{ENV}
$ mpirun -np 2 -host c0-0,c0-1 nvprof -o output.%h.%p.%q{OMPI_COMM_WORLD_RANK} ./my_mpi_app