python concurrent.futures.ProcessPoolExecutor: Performance of .submit() vs .map()

前端 未结 2 601
清歌不尽
清歌不尽 2020-12-13 11:02

I am using concurrent.futures.ProcessPoolExecutor to find the occurrence of a number from a number range. The intent is to investigate the amount of speed-up pe

2条回答
  •  借酒劲吻你
    2020-12-13 11:45

    You're comparing apples to oranges here. When using map you produce all the 1E8 numbers and transfer them to worker processes. This takes a lot of time compared to actual execution. When using submit you just create 6 sets of parameters that get transferred.

    If you change map to operate with the same principle you'll get numbers that are close to each other:

    def _findmatch(nmin, nmax, number):
        '''Function to find the occurrence of number in range nmin to nmax and return
           the found occurrences in a list.'''
        print('\n def _findmatch', nmin, nmax, number)
        start = time()
        match=[]
        for n in range(nmin, nmax):
            if number in str(n):
                match.append(n)
        end = time() - start
        print("found {0} in {1:.4f}sec".format(len(match),end))
        return match
    
    def _concurrent_map(nmax, number, workers):
        '''Function that utilises concurrent.futures.ProcessPoolExecutor.map to
           find the occurrences of a given number in a number range in a parallelised
           manner.'''
        # 1. Local variables
        start = time()
        chunk = nmax // workers
        futures = []
        found =[]
        #2. Parallelization
        with cf.ProcessPoolExecutor(max_workers=workers) as executor:
            # 2.1. Discretise workload and submit to worker pool
            cstart = (chunk * i for i in range(workers))
            cstop = (chunk * i if i != workers else nmax for i in range(1, workers + 1))
            futures = executor.map(_findmatch, cstart, cstop, itertools.repeat(number))
    
            # 2.3. Consolidate result as a list and return this list.
            for future in futures:
                for f in future:
                    try:
                        found.append(f)
                    except:
                        print_exc()
            foundsize = len(found)
            end = time() - start
            print('within statement of def _concurrent(nmax, number):')
            print("found {0} in {1:.4f}sec".format(foundsize, end))
        return found
    

    You could improve the performance of submit by using as_completed correctly. For given iterable of futures it will return an iterator that will yield futures in the order they complete.

    You could also skip the copying of the data to another array and use itertools.chain.from_iterable to combine the results from futures to single iterable:

    import concurrent.futures as cf
    import itertools
    from time import time
    from traceback import print_exc
    from itertools import chain
    
    def _findmatch(nmin, nmax, number):
        '''Function to find the occurrence of number in range nmin to nmax and return
           the found occurrences in a list.'''
        print('\n def _findmatch', nmin, nmax, number)
        start = time()
        match=[]
        for n in range(nmin, nmax):
            if number in str(n):
                match.append(n)
        end = time() - start
        print("found {0} in {1:.4f}sec".format(len(match),end))
        return match
    
    def _concurrent_map(nmax, number, workers):
        '''Function that utilises concurrent.futures.ProcessPoolExecutor.map to
           find the occurrences of a given number in a number range in a parallelised
           manner.'''
        # 1. Local variables
        chunk = nmax // workers
        futures = []
        found =[]
        #2. Parallelization
        with cf.ProcessPoolExecutor(max_workers=workers) as executor:
            # 2.1. Discretise workload and submit to worker pool
            for i in range(workers):
                cstart = chunk * i
                cstop = chunk * (i + 1) if i != workers - 1 else nmax
                futures.append(executor.submit(_findmatch, cstart, cstop, number))
    
        return chain.from_iterable(f.result() for f in cf.as_completed(futures))
    
    if __name__ == '__main__':
        nmax = int(1E8) # Number range maximum.
        number = str(5) # Number to be found in number range.
        workers = 6     # Pool of workers
    
        start = time()
        a = _concurrent_map(nmax, number, workers)
        end = time() - start
        print('\n main')
        print('workers = ', workers)
        print("found {0} in {1:.4f}sec".format(sum(1 for x in a),end))
    

提交回复
热议问题