I am using concurrent.futures.ProcessPoolExecutor to find the occurrence of a number from a number range. The intent is to investigate the amount of speed-up pe
You're comparing apples to oranges here. When using map you produce all the 1E8 numbers and transfer them to worker processes. This takes a lot of time compared to actual execution. When using submit you just create 6 sets of parameters that get transferred.
If you change map to operate with the same principle you'll get numbers that are close to each other:
def _findmatch(nmin, nmax, number):
'''Function to find the occurrence of number in range nmin to nmax and return
the found occurrences in a list.'''
print('\n def _findmatch', nmin, nmax, number)
start = time()
match=[]
for n in range(nmin, nmax):
if number in str(n):
match.append(n)
end = time() - start
print("found {0} in {1:.4f}sec".format(len(match),end))
return match
def _concurrent_map(nmax, number, workers):
'''Function that utilises concurrent.futures.ProcessPoolExecutor.map to
find the occurrences of a given number in a number range in a parallelised
manner.'''
# 1. Local variables
start = time()
chunk = nmax // workers
futures = []
found =[]
#2. Parallelization
with cf.ProcessPoolExecutor(max_workers=workers) as executor:
# 2.1. Discretise workload and submit to worker pool
cstart = (chunk * i for i in range(workers))
cstop = (chunk * i if i != workers else nmax for i in range(1, workers + 1))
futures = executor.map(_findmatch, cstart, cstop, itertools.repeat(number))
# 2.3. Consolidate result as a list and return this list.
for future in futures:
for f in future:
try:
found.append(f)
except:
print_exc()
foundsize = len(found)
end = time() - start
print('within statement of def _concurrent(nmax, number):')
print("found {0} in {1:.4f}sec".format(foundsize, end))
return found
You could improve the performance of submit by using as_completed correctly. For given iterable of futures it will return an iterator that will yield futures in the order they complete.
You could also skip the copying of the data to another array and use itertools.chain.from_iterable to combine the results from futures to single iterable:
import concurrent.futures as cf
import itertools
from time import time
from traceback import print_exc
from itertools import chain
def _findmatch(nmin, nmax, number):
'''Function to find the occurrence of number in range nmin to nmax and return
the found occurrences in a list.'''
print('\n def _findmatch', nmin, nmax, number)
start = time()
match=[]
for n in range(nmin, nmax):
if number in str(n):
match.append(n)
end = time() - start
print("found {0} in {1:.4f}sec".format(len(match),end))
return match
def _concurrent_map(nmax, number, workers):
'''Function that utilises concurrent.futures.ProcessPoolExecutor.map to
find the occurrences of a given number in a number range in a parallelised
manner.'''
# 1. Local variables
chunk = nmax // workers
futures = []
found =[]
#2. Parallelization
with cf.ProcessPoolExecutor(max_workers=workers) as executor:
# 2.1. Discretise workload and submit to worker pool
for i in range(workers):
cstart = chunk * i
cstop = chunk * (i + 1) if i != workers - 1 else nmax
futures.append(executor.submit(_findmatch, cstart, cstop, number))
return chain.from_iterable(f.result() for f in cf.as_completed(futures))
if __name__ == '__main__':
nmax = int(1E8) # Number range maximum.
number = str(5) # Number to be found in number range.
workers = 6 # Pool of workers
start = time()
a = _concurrent_map(nmax, number, workers)
end = time() - start
print('\n main')
print('workers = ', workers)
print("found {0} in {1:.4f}sec".format(sum(1 for x in a),end))