问题
I am trying to learn MPI. When I am sending data from 1 processor to another, I am successfully able to send the data and receive it in the other in a variable. But, when I try to send and receive on both the processors I get the invalid rank error.
Here is my code for the program
#include <mpi.h>
#include <stdio.h>
#include <unistd.h>
int main(int argc, char **argv) {
  int world_size;
  int rank;
  char hostname[256];
  char processor_name[MPI_MAX_PROCESSOR_NAME];
  int name_len;
  int tag = 4;
  int value = 4;
  int master = 0;
  int rec;
  MPI_Status status;
  // Initialize the MPI environment
  MPI_Init(&argc,&argv);
  // get the total number of processes
  MPI_Comm_size(MPI_COMM_WORLD, &world_size);
  // get the rank of current process
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  // get the name of the processor
  MPI_Get_processor_name(processor_name, &name_len);
  // get the hostname
  gethostname(hostname,255);
  printf("World size is %d\n",world_size);
  if(rank == master){
        MPI_Send(&value,1,MPI_INT,1,tag,MPI_COMM_WORLD);
        MPI_Recv(&rec,1,MPI_INT,1,tag,MPI_COMM_WORLD,&status);
        printf("In master with value %d\n",rec);
  }
  if(rank == 1){
        MPI_Send(&tag,1,MPI_INT,0,tag,MPI_COMM_WORLD);
        MPI_Recv(&rec,1,MPI_INT,0,tag,MPI_COMM_WORLD,&status);
        printf("in slave with rank %d and value %d\n",rank, rec);
  }
  printf("Hello world!  I am process number: %d from processor %s on host %s out of %d processors\n", rank, processor_name, hostname, world_size);
  MPI_Finalize();
  return 0;
}
Here is my PBS file:
#!/bin/bash
#PBS -l nodes=1:ppn=8,walltime=1:00
#PBS -N MPIsample
#PBS -q edu_shared
#PBS -m abe
#PBS -M blahblah@blah.edu
#PBS -e mpitest.err
#PBS -o mpitest.out
#PBS -d /export/home/blah/MPIsample
mpirun -machinefile $PBS_NODEFILE -np $PBS_NP ./mpitest
The output file comes out like this:
World size is 1
World size is 1
World size is 1
World size is 1
World size is 1
World size is 1
World size is 1
World size is 1
===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 6
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
Job complete
If the world size is 1, the world size should be printed once and not 8 times.
The err file is:
[compute-0-34.local:13110] *** An error occurred in MPI_Send
[compute-0-34.local:13110] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13110] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13110] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13107] *** An error occurred in MPI_Send
[compute-0-34.local:13107] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13107] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13107] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13112] *** An error occurred in MPI_Send
[compute-0-34.local:13112] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13112] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13112] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13108] *** An error occurred in MPI_Send
[compute-0-34.local:13108] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13108] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13108] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13109] *** An error occurred in MPI_Send
[compute-0-34.local:13109] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13109] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13109] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13113] *** An error occurred in MPI_Send
[compute-0-34.local:13113] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13113] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13113] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13106] *** An error occurred in MPI_Send
[compute-0-34.local:13106] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13106] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13106] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13111] *** An error occurred in MPI_Send
[compute-0-34.local:13111] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13111] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13111] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
2 days ago I was able to send and receive simultaneously but after that the working code is showing me this error. Is there any problem in my code or in the High Performance computer that I am working on?
回答1:
From a MPI point of view, you did not launch one MPI job with 8 MPI tasks, but 8 independent MPI jobs with one MPI task each.
That typically occurs when you are mixing two MPI implementations (for example your application was built with Open MPI, and you are using MPICH mpirun).
Before invoking mpirun, i suggest you add in your PBS script
which mpirun
ldd mpitest
Make sure mpirun and the MPI libs are from the same library (e.g. same vendor and same version)
回答2:
There was a problem with HPC and it was not allotting me the required number of processors. Thanks guys.
来源:https://stackoverflow.com/questions/46538735/error-occurred-in-mpi-send-on-communicator-mpi-comm-world-mpi-err-rankinvalid-r