I am using multiprocessing.Pool()
to parallelize some heavy computations.
The target function returns a lot of data (a huge list). I\'m running out of R
If your tasks can return data in chunks… can they be broken up into smaller tasks, each of which returns a single chunk? Obviously, this isn't always possible. When it isn't, you have to use some other mechanism (like a Queue
, as Loren Abrams suggests). But when it is, it's probably a better solution for other reasons, as well as solving this problem.
With your example, this is certainly doable. For example:
def target_fnc(arg, low, high):
result = []
for i in xrange(low, high):
result.append('dvsdbdfbngd') # <== would like to just use yield!
return result
def process_args(some_args):
pool = Pool(16)
pool_args = []
for low in in range(0, 1000000, 10000):
pool_args.extend(args + [low, low+10000] for args in some_args)
for result in pool.imap_unordered(target_fnc, pool_args):
for element in result:
yield element
(You could of course replace the loop with a nested comprehension, or a zip
and flatten
, if you prefer.)
So, if some_args
is [1, 2, 3]
, you'll get 300 tasks—[[1, 0, 10000], [2, 0, 10000], [3, 0, 10000], [1, 10000, 20000], …]
, each of which only returns 10000 elements instead of 1000000.