multiprocessing with large data

后端 未结 3 1863
情深已故
情深已故 2020-12-05 18:34

I am using multiprocessing.Pool() to parallelize some heavy computations.

The target function returns a lot of data (a huge list). I\'m running out of R

3条回答
  •  情深已故
    2020-12-05 19:01

    If your tasks can return data in chunks… can they be broken up into smaller tasks, each of which returns a single chunk? Obviously, this isn't always possible. When it isn't, you have to use some other mechanism (like a Queue, as Loren Abrams suggests). But when it is, it's probably a better solution for other reasons, as well as solving this problem.

    With your example, this is certainly doable. For example:

    def target_fnc(arg, low, high):
       result = []
       for i in xrange(low, high):
           result.append('dvsdbdfbngd') # <== would like to just use yield!
       return result
    
    def process_args(some_args):
        pool = Pool(16)
        pool_args = []
        for low in in range(0, 1000000, 10000):
            pool_args.extend(args + [low, low+10000] for args in some_args)
        for result in pool.imap_unordered(target_fnc, pool_args):
            for element in result:
                yield element
    

    (You could of course replace the loop with a nested comprehension, or a zip and flatten, if you prefer.)

    So, if some_args is [1, 2, 3], you'll get 300 tasks—[[1, 0, 10000], [2, 0, 10000], [3, 0, 10000], [1, 10000, 20000], …], each of which only returns 10000 elements instead of 1000000.

提交回复
热议问题