multiprocessing with large data

后端 未结 3 1865
情深已故
情深已故 2020-12-05 18:34

I am using multiprocessing.Pool() to parallelize some heavy computations.

The target function returns a lot of data (a huge list). I\'m running out of R

3条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-12-05 19:06

    From your description, it sounds like you're not so much interested in processing the data as they come in, as in avoiding passing a million-element list back.

    There's a simpler way of doing that: Just put the data into a file. For example:

    def target_fnc(arg):
        fd, path = tempfile.mkstemp(text=True)
        with os.fdopen(fd) as f:
            for i in xrange(1000000):
                f.write('dvsdbdfbngd\n')
        return path
    
    def process_args(some_args):
        pool = Pool(16)
        for result in pool.imap_unordered(target_fnc, some_args):
            with open(result) as f:
                for element in f:
                    yield element
    

    Obviously if your results can contain newlines, or aren't strings, etc., you'll want to use a csv file, a numpy, etc. instead of a simple text file, but the idea is the same.

    That being said, even if this is simpler, there are usually benefits to processing the data a chunk at a time, so breaking up your tasks or using a Queue (as the other two answers suggest) may be better, if the downsides (respectively, needing a way to break the tasks up, or having to be able to consume the data as fast as they're produced) are not deal-breakers.

提交回复
热议问题