Python Multiprocessing with Distributed Cluster

前端 未结 4 800
清酒与你
清酒与你 2020-12-04 12:38

I am looking for a python package that can do multiprocessing not just across different cores within a single computer, but also with a cluster distributed across multiple m

4条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2020-12-04 13:02

    Have you looked to disco?

    Features:

    • Map / Reduce paradigm
    • Python programming
    • Distributed shared disk
    • ssh underlaying transport
    • web and console interfaces
    • easy to add/block/delete a node
    • master launch slaves nodes without user intervention
    • slaves nodes are automatically restarted in case of failure
    • nice documentation. Following the Install Guide I was able to launch a 2-machine cluster in a few minutes (the only thing I need to do was creating $DISCO_HOME/root folder in order to connect to the WebUI, I guess due of log file error creation).

    A simple example from disco's documentation:

    from disco.core import Job, result_iterator
    
    def map(line, params):
        for word in line.split():
            yield word, 1
    
    def reduce(iter, params):
        from disco.util import kvgroup
        for word, counts in kvgroup(sorted(iter)):
            yield word, sum(counts)
    
    if __name__ == '__main__':
        job = Job().run(input=["http://discoproject.org/media/text/chekhov.txt"],
                        map=map,
                        reduce=reduce)
        for word, count in result_iterator(job.wait(show=True)):
            print(word, count)
    

提交回复
热议问题