Apply a method to a list of objects in parallel using multi-processing

后端 未结 5 882
夕颜
夕颜 2021-01-01 18:58

I have created a class with a number of methods. One of the methods is very time consuming, my_process, and I\'d like to do that method in parallel. I came acro

5条回答
  •  梦谈多话
    2021-01-01 19:47

    I'm going to go against the grain here, and suggest sticking to the simplest thing that could possibly work ;-) That is, Pool.map()-like functions are ideal for this, but are restricted to passing a single argument. Rather than make heroic efforts to worm around that, simply write a helper function that only needs a single argument: a tuple. Then it's all easy and clear.

    Here's a complete program taking that approach, which prints what you want under Python 2, and regardless of OS:

    class MyClass():
        def __init__(self, input):
            self.input = input
            self.result = int
    
        def my_process(self, multiply_by, add_to):
            self.result = self.input * multiply_by
            self._my_sub_process(add_to)
            return self.result
    
        def _my_sub_process(self, add_to):
            self.result += add_to
    
    import multiprocessing as mp
    NUM_CORE = 4  # set to the number of cores you want to use
    
    def worker(arg):
        obj, m, a = arg
        return obj.my_process(m, a)
    
    if __name__ == "__main__":
        list_of_numbers = range(0, 5)
        list_of_objects = [MyClass(i) for i in list_of_numbers]
    
        pool = mp.Pool(NUM_CORE)
        list_of_results = pool.map(worker, ((obj, 100, 1) for obj in list_of_objects))
        pool.close()
        pool.join()
    
        print list_of_numbers
        print list_of_results
    

    A big of magic

    I should note there are many advantages to taking the very simple approach I suggest. Beyond that it "just works" on Pythons 2 and 3, requires no changes to your classes, and is easy to understand, it also plays nice with all of the Pool methods.

    However, if you have multiple methods you want to run in parallel, it can get a bit annoying to write a tiny worker function for each. So here's a tiny bit of "magic" to worm around that. Change worker() like so:

    def worker(arg):
        obj, methname = arg[:2]
        return getattr(obj, methname)(*arg[2:])
    

    Now a single worker function suffices for any number of methods, with any number of arguments. In your specific case, just change one line to match:

    list_of_results = pool.map(worker, ((obj, "my_process", 100, 1) for obj in list_of_objects))
    

    More-or-less obvious generalizations can also cater to methods with keyword arguments. But, in real life, I usually stick to the original suggestion. At some point catering to generalizations does more harm than good. Then again, I like obvious things ;-)

提交回复
热议问题