在work.pbs中运行程序:
mpirun -np `wc -l < $PBS_NODEFILE` -machine `echo $PBS_NODEFILE` ./run
qsub work.pbs提交后一直运行卡住,qdel强行终结后提示出错:
[mpiexec@cu08] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:221): assert (!closed) failed
[mpiexec@cu08] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:128): unable to send SIGUSR1 downstream
[mpiexec@cu08] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@cu08] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:388): error waiting for event
[mpiexec@cu08] main (./ui/mpich/mpiexec.c:718): process manager error waiting for completion
之后在work.pbs中运行程序前加了:
cat $PBS_NODEFILE
看看用了哪些节点,结果发现其中一个节点ssh登录不上了,换了其他节点就行了。
来源:CSDN
作者:djdaj
链接:https://blog.csdn.net/djdaj/article/details/104503315