发表新帖

发表新帖

multiprocessing with large data

后端未结

关注

 3  1863

情深已故 2020-12-05 18:34

I am using multiprocessing.Pool() to parallelize some heavy computations.

The target function returns a lot of data (a huge list). I\'m running out of R

3条回答

情深已故 (楼主)

2020-12-05 19:01
If your tasks can return data in chunks… can they be broken up into smaller tasks, each of which returns a single chunk? Obviously, this isn't always possible. When it isn't, you have to use some other mechanism (like a Queue, as Loren Abrams suggests). But when it is, it's probably a better solution for other reasons, as well as solving this problem.

With your example, this is certainly doable. For example:
```
def target_fnc(arg, low, high):
   result = []
   for i in xrange(low, high):
       result.append('dvsdbdfbngd') # <== would like to just use yield!
   return result

def process_args(some_args):
    pool = Pool(16)
    pool_args = []
    for low in in range(0, 1000000, 10000):
        pool_args.extend(args + [low, low+10000] for args in some_args)
    for result in pool.imap_unordered(target_fnc, pool_args):
        for element in result:
            yield element
```
(You could of course replace the loop with a nested comprehension, or a zip and flatten, if you prefer.)

So, if some_args is [1, 2, 3], you'll get 300 tasks—[[1, 0, 10000], [2, 0, 10000], [3, 0, 10000], [1, 10000, 20000], …], each of which only returns 10000 elements instead of 1000000.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题