Python Multiprocessing with Distributed Cluster

前端 未结 4 795
清酒与你
清酒与你 2020-12-04 12:38

I am looking for a python package that can do multiprocessing not just across different cores within a single computer, but also with a cluster distributed across multiple m

4条回答
  •  一整个雨季
    2020-12-04 13:14

    If you want a very easy solution, there isn't one.

    However, there is a solution that has the multiprocessing interface -- pathos -- which has the ability to establish connections to remote servers through a parallel map, and to do multiprocessing.

    If you want to have a ssh-tunneled connection, you can do that… or if you are ok with a less secure method, you can do that too.

    >>> # establish a ssh tunnel
    >>> from pathos.core import connect
    >>> tunnel = connect('remote.computer.com', port=1234)
    >>> tunnel       
    Tunnel('-q -N -L55774:remote.computer.com:1234 remote.computer.com')
    >>> tunnel._lport
    55774
    >>> tunnel._rport
    1234
    >>> 
    >>> # define some function to run in parallel
    >>> def sleepy_squared(x):
    ...   from time import sleep
    ...   sleep(1.0)
    ...   return x**2
    ... 
    >>> # build a pool of servers and execute the parallel map
    >>> from pathos.pp import ParallelPythonPool as Pool
    >>> p = Pool(8, servers=('localhost:55774',))
    >>> p.servers
    ('localhost:55774',)
    >>> y = p.map(sleepy_squared, x)
    >>> y
    [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
    

    Or, instead you could configure for a direct connection (no ssh)

    >>> p = Pool(8, servers=('remote.computer.com:5678',))
    # use an asynchronous parallel map
    >>> res = p.amap(sleepy_squared, x)
    >>> res.get()
    [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
    

    It's all a bit finicky, for the remote server to work, you have to start a server running on remote.computer.com at the specified port beforehand -- and you have to make sure that both the settings on your localhost and the remote host are going to allow either the direct connection or the ssh-tunneled connection. Plus, you need to have the same version of pathos and of the pathos fork of pp running on each host. Also, for ssh, you need to have ssh-agent running to allow password-less login with ssh.

    But then, hopefully it all works… if your function code can be transported over to the remote host with dill.source.importable.

    FYI, pathos is long overdue a release, and basically, there are a few bugs and interface changes that need to be resolved before a new stable release is cut.

提交回复
热议问题