When you map an iterable to a multiprocessing.Pool are the iterations divided into a queue for each process in the pool at the start, or is there a
To estimate chunksize used by a Python implementation without looking at its multiprocessing module source code, run:
#!/usr/bin/env python
import multiprocessing as mp
from itertools import groupby
def work(index):
mp.get_logger().info(index)
return index, mp.current_process().name
if __name__ == "__main__":
import logging
import sys
logger = mp.log_to_stderr()
# process cmdline args
try:
sys.argv.remove('--verbose')
except ValueError:
pass # not verbose
else:
logger.setLevel(logging.INFO) # verbose
nprocesses, nitems = int(sys.argv.pop(1)), int(sys.argv.pop(1))
# choices: 'map', 'imap', 'imap_unordered'
map_name = sys.argv[1] if len(sys.argv) > 1 else 'map'
kwargs = dict(chunksize=int(sys.argv[2])) if len(sys.argv) > 2 else {}
# estimate chunksize used
max_chunksize = 0
map_func = getattr(mp.Pool(nprocesses), map_name)
for _, group in groupby(sorted(map_func(work, range(nitems), **kwargs),
key=lambda x: x[0]), # sort by index
key=lambda x: x[1]): # group by process name
max_chunksize = max(max_chunksize, len(list(group)))
print("%s: max_chunksize %d" % (map_name, max_chunksize))
It shows that imap, imap_unordered use chunksize=1 by default and max_chunksize for map depends on nprocesses, nitem (number of chunks per process is not fixed) and max_chunksize depends on python version. All *map* functions take into account chunksize parameter if it is specified.
$ ./estimate_chunksize.py nprocesses nitems [map_name [chunksize]] [--verbose]
To see how individual jobs are distributed; specify --verbose parameter.