I am using multiprocessing.Pool()
to parallelize some heavy computations.
The target function returns a lot of data (a huge list). I\'m running out of R
From your description, it sounds like you're not so much interested in processing the data as they come in, as in avoiding passing a million-element list
back.
There's a simpler way of doing that: Just put the data into a file. For example:
def target_fnc(arg):
fd, path = tempfile.mkstemp(text=True)
with os.fdopen(fd) as f:
for i in xrange(1000000):
f.write('dvsdbdfbngd\n')
return path
def process_args(some_args):
pool = Pool(16)
for result in pool.imap_unordered(target_fnc, some_args):
with open(result) as f:
for element in f:
yield element
Obviously if your results can contain newlines, or aren't strings, etc., you'll want to use a csv
file, a numpy
, etc. instead of a simple text file, but the idea is the same.
That being said, even if this is simpler, there are usually benefits to processing the data a chunk at a time, so breaking up your tasks or using a Queue
(as the other two answers suggest) may be better, if the downsides (respectively, needing a way to break the tasks up, or having to be able to consume the data as fast as they're produced) are not deal-breakers.