MPICH communication failed

老子叫甜甜 提交于 2019-12-11 03:07:20

问题


I have a simple MPICH program in which processes send & receive messages from each other in a Ring order.
I've setup to 2 identical virtual machine, and made sure network is working fine. I've tested a simple MPICH program both machines and it works fine. The problem arises when I try to communicate between processes on different machines like the above program. I'm getting the following error:

Fatal error in MPI_Send: A process has failed, error stack:
MPI_Send(171)...............: MPI_Send(buf=0xbfed8c08, count=1, MPI_INT, dest=1,
tag=1, MPI_COMM_WORLD) failed
MPID_nem_tcp_connpoll(1826): Communication error with rank 1: Connection refused

  • SSH is passwordless & works fine on both sides.
  • /etc/hosts is configured properly.
  • Firewall is disabled on both machines.
  • Configured NFS Client/Server and shared a directory between them. (According to this)
  • Tried both MPICH & OpenMPI with Hydra

回答1:


Here what i did, And it works!

Installed following package using source code (tarball)

hydra 
openmpi

Created hosts file (both node)

# cat /home/spatel/mpi/hydra/hosts
node1
node2 

Set variable in .bashrc on (both node)

echo HYDRA_HOST_FILE=/home/spatel/mpi/hydra/hosts >> ~/.bashrc

Use HelloWorld MPI program to run on single node.

node1# /home/spatel/mpi/hydra/bin/mpiexec -np 1 /home/spatel/mpi/mpi_hello_world
Hello world from processor node1.example.com, rank 0 out of 1 processors

Run on multiple node using -machinefile option -np is number of processor

node1# /home/spatel/mpi/hydra/bin/mpiexec -np 4 -machinefile /home/spatel/mpi/hydra/hosts /home/spatel/mpi/mpi_hello_world
Hello world from processor node1.example.com, rank 0 out of 1 processors
Hello world from processor node2.example.com, rank 0 out of 1 processors
Hello world from processor node1.example.com, rank 0 out of 1 processors
Hello world from processor node2.example.com, rank 0 out of 1 processors


来源:https://stackoverflow.com/questions/14571203/mpich-communication-failed

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!