Python Multiprocessing with Distributed Cluster

前端 未结 4 796
清酒与你
清酒与你 2020-12-04 12:38

I am looking for a python package that can do multiprocessing not just across different cores within a single computer, but also with a cluster distributed across multiple m

4条回答
  •  南方客
    南方客 (楼主)
    2020-12-04 13:01

    A little late to the party here, but since I was also looking for a similar solution, and this question is still not marked as answered, I thought I would contribute my findings.

    I ended up using SCOOP. It provides a parallel map implementation that can work across multiple cores, across multiple hosts. It can also fall back to Python's serial map function if desired during invocation.

    From SCOOP's introduction page, it cites the following features:

    SCOOP features and advantages over futures, multiprocessing and similar modules are as follows:

    • Harness the power of multiple computers over network;
    • Ability to spawn multiple tasks inside a task;
    • API compatible with PEP-3148;
    • Parallelizing serial code with only minor modifications;
    • Efficient load-balancing.

    It does have some quirks (functions/classes must be pickleable), and the setup to get things running smoothly across multiple hosts can be tedious if they don't all share the same filesystem schema, but overall I'm quite happy with the results. For our purposes, doing quite a bit of Numpy & Cython, it provides excellent performance.

    Hope this helps.

提交回复
热议问题