I had a quick look on the forums and I don\'t think this question has been asked already.
I am currently working with an MPI/CUDA hybrid code, made by somebody else
Take a look at nvprof
, part of the CUDA 5.0 Toolkit (currently available as a release candidate). There are some limitations - it can only collect a limited number of counters in a given pass and it cannot collect metrics (so for now you'd have to script multiple launches if you want more than a few events). You can get more information from the nvvp built-in help, including an example MPI launch script (copied here but I suggest you check out the nvvp help for an up-to-date version if you have anything newer than the 5.0 RC).
#!/bin/sh
#
# Script to launch nvprof on an MPI process. This script will
# create unique output file names based on the rank of the
# process. Examples:
# mpirun -np 4 nvprof-script a.out
# mpirun -np 4 nvprof-script -o outfile a.out
# mpirun -np 4 nvprof-script test/a.out -g -j
# In the case you want to pass a -o or -h flag to the a.out, you
# can do this.
# mpirun -np 4 nvprof-script -c a.out -h -o
# You can also pass in arguments to nvprof
# mpirun -np 4 nvprof-script --print-api-trace a.out
#
usage () {
echo "nvprof-script [nvprof options] [-h] [-o outfile] a.out [a.out options]";
echo "or"
echo "nvprof-script [nvprof options] [-h] [-o outfile] -c a.out [a.out options]";
}
nvprof_args=""
while [ $# -gt 0 ];
do
case "$1" in
(-o) shift; outfile="$1";;
(-c) shift; break;;
(-h) usage; exit 1;;
(*) nvprof_args="$nvprof_args $1";;
esac
shift
done
# If user did not provide output filename then create one
if [ -z $outfile ] ; then
outfile=`basename $1`.nvprof-out
fi
# Find the rank of the process from the MPI rank environment variable
# to ensure unique output filenames. The script handles Open MPI
# and MVAPICH. If your implementation is different, you will need to
# make a change here.
# Open MPI
if [ ! -z ${OMPI_COMM_WORLD_RANK} ] ; then
rank=${OMPI_COMM_WORLD_RANK}
fi
# MVAPICH
if [ ! -z ${MV2_COMM_WORLD_RANK} ] ; then
rank=${MV2_COMM_WORLD_RANK}
fi
# Set the nvprof command and arguments.
NVPROF="nvprof --output-profile $outfile.$rank $nvprof_args"
exec $NVPROF $*
# If you want to limit which ranks get profiled, do something like
# this. You have to use the -c switch to get the right behavior.
# mpirun -np 2 nvprof-script --print-api-trace -c a.out -q
# if [ $rank -le 0 ]; then
# exec $NVPROF $*
# else
# exec $*
# fi