How can I make multiprocessing.pool.map distribute processes in numerical order?
More Info:
I have a program which processes a few thousand data files, mak
What about changing map to imap:
import os
from multiprocessing import Pool
import time
num_proc = 4
num_calls = 20
sleeper = 0.1
def SomeFunc(arg):
time.sleep(sleeper)
print "%s %5d" % (os.getpid(), arg)
return arg
proc_pool = Pool(num_proc)
list(proc_pool.imap(SomeFunc, range(num_calls)))
The reason maybe that the default chunksize of imap is 1, so it may not run as far as map.
The reason that this occurs is because each process is given a predefined amount of work to do at the start of the call to map which is dependant on the chunksize. We can work out the default chunksize by looking at the source for pool.map
chunksize, extra = divmod(len(iterable), len(self._pool) * 4)
if extra:
chunksize += 1
So for a range of 20, and with 4 processes, we will get a chunksize of 2.
If we modify your code to reflect this we should get similar results to the results you are getting now:
proc_pool.map(SomeFunc, range(num_calls), chunksize=2)
This yields the output:
0 2 6 4 1 7 5 3 8 10 12 14 9 13 15 11 16 18 17 19
Now, setting the chunksize=1 will ensure that each process within the pool will only be given one task at a time.
proc_pool.map(SomeFunc, range(num_calls), chunksize=1)
This should ensure a reasonably good numerical ordering compared to that when not specifying a chunksize. For example a chunksize of 1 yields the output:
0 1 2 3 4 5 6 7 9 10 8 11 13 12 15 14 16 17 19 18