发表新帖

发表新帖

multiprocessing with large data

后端未结

关注

 3  1865

情深已故 2020-12-05 18:34

I am using multiprocessing.Pool() to parallelize some heavy computations.

The target function returns a lot of data (a huge list). I\'m running out of R

3条回答

轻奢々 (楼主)

2020-12-05 19:06
From your description, it sounds like you're not so much interested in processing the data as they come in, as in avoiding passing a million-element list back.

There's a simpler way of doing that: Just put the data into a file. For example:
```
def target_fnc(arg):
    fd, path = tempfile.mkstemp(text=True)
    with os.fdopen(fd) as f:
        for i in xrange(1000000):
            f.write('dvsdbdfbngd\n')
    return path

def process_args(some_args):
    pool = Pool(16)
    for result in pool.imap_unordered(target_fnc, some_args):
        with open(result) as f:
            for element in f:
                yield element
```
Obviously if your results can contain newlines, or aren't strings, etc., you'll want to use a csv file, a numpy, etc. instead of a simple text file, but the idea is the same.

That being said, even if this is simpler, there are usually benefits to processing the data a chunk at a time, so breaking up your tasks or using a Queue (as the other two answers suggest) may be better, if the downsides (respectively, needing a way to break the tasks up, or having to be able to consume the data as fast as they're produced) are not deal-breakers.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题