OpenMPI 1.4.3 mpirun hostfile error

匿名 (未验证) 提交于 2019-12-03 00:50:01

问题:

I am trying to run a simple MPI program on 4 nodes. I am using OpenMPI 1.4.3 running on Centos 5.5. When I submit the MPIRUN Command with the hostfile/machinefile, I get no output, receive a blank screen. Hence, I have to kill the job. .

I use the following run command: : mpirun --hostfile hostfile -np 4 new46

 OUTPUT ON KILLING JOB:  mpirun: killing job...  --------------------------------------------------------------------------   mpirun noticed that the job aborted, but has no info as to the process that caused    that situation.   --------------------------------------------------------------------------   mpirun was unable to cleanly terminate the daemons on the nodes shown    below. Additional manual cleanup may be required - please refer to    the "orte-clean" tool for assistance.    --------------------------------------------------------------------------     myocyte46 - daemon did not report back when launched     myocyte47 - daemon did not report back when launched     myocyte49 - daemon did not report back when launched 

Here is the MPI program I am trying to execute on 4 nodes

   **************************     if (my_rank != 0)    {     sprintf(message, "Greetings from the process %d!", my_rank);     dest = 0;     MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);    }    else    {    for (source = 1;source < p; source++)    {     MPI_Recv(message, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status);     printf("%s\n", message);    }     **************************** 

My hostfile looks like this:

   [amohan@myocyte48 ~]$ cat hostfile    myocyte46    myocyte47    myocyte48    myocyte49    ******************************* 

I ran the above MPI program independently on each of the nodes and it compiled and ran just fine. I have this issue of "Daemon did not report back when launched" when I use the hostfile. I am trying to figure out what could be the issue.

Thanks!

回答1:

I think these lines

myocyte46 - daemon did not report back when launched 

are pretty clear -- you're having trouble either launching the mpi daemons or communicating with them afterwards. So you need to start looking at networking. Can you ssh without password into these nodes? Can you ssh back? Leaving aside the MPI program, can you

mpirun -np 4 hostname 

and get anything?



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!