问题
I would like to display "hello world" via MPI on different Google cloud compute instances with the help of the following code:
from mpi4py import MPI
size = MPI.COMM_WORLD.Get_size()
rank = MPI.COMM_WORLD.Get_rank()
name = MPI.Get_processor_name()
print("Hello, World! I am process/rank {} of {} on {}.\n".format(rank, size, name))
.
The problem is, that even so I can ssh-connect across all of these instances without problem, I get a permission denied error message when I try to run my script. I use following command to envoke my script:
mpirun --host localhost,instance_1,instance_2 python hello_world.py
.
And get the following error message:
Permission denied (publickey).
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:
* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.
* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.
* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
--------------------------------------------------------------------------
.
Additional information:
- I installed open-MPI on all of my nodes
- I have Google automatically set all of my ssh-keys by using gcloud to log into each instance from each instance
- instance-type: n1-standard-1
- instance-OS: Linux Debian (default)
.
Thanks you for your help :-)
.
New Information:
(thanks @ Zulan for pointing out that I should edit my previous post instead of creating a new answer for new information)
So, I tried to do the same with mpich instead of openmpi. However, I run into a similar error message.
Command:
mpirun --host localhost,instance_1,instance_2 python hello_world.py
.
Error message:
Host key verification failed.
.
I can ssh-connect between my two instances without problems, and through the gcloud commands the ssh-keys should automatically be set up properly.
So, has somebody an idea what the problem could be? I also checked the path, the firewall rules, and my ability to write startup scripts in the temp-folder. Can someone please try to recreate this problem? + Should I raise this question to Google? (never done such thing before, Im quite unsure :S)
Thanks for helping :)
回答1:
so I finally found a solution. Wow, problem was driving me nuts.
So it turned out, that I needed to generate ssh-keys manually for the script to work. I have no idea why, because google-services already set up the keys by using
gcloud compute ssh
, but well, it worked :)
Steps I did:
instance_1 $ ssh-keygen -t rsa
instance_1 $ cd .ssh
instance_1 $ cat id_rsa.pub >> authorized_keys
instance_1 $ gcloud compute copy-files id_rsa.pub
instance_1 $ gcloud compute ssh instance_2
instance_2 $ cd .ssh
instance_2 $ cat id_rsa.pub >> authorized_keys
.
I will open another topic and ask why I cannot use ssh instance_2
, even so gcloud compute ssh instance_2
is working. See: Difference between the commands "gcloud compute ssh" and "ssh"
来源:https://stackoverflow.com/questions/35819556/openmpi-permission-denied-error-while-trying-to-use-mpirun