How to parallelize iteration over a range, using StdLib and Python 3?

前端 未结 3 1301
陌清茗
陌清茗 2020-12-11 21:19

I\'ve been searching for an answer on this now for days to no avail. I\'m probably just not understanding the pieces that are floating around out there and the Python docume

3条回答
  •  甜味超标
    2020-12-11 21:39

    Yes, that is doable. Your calculation is not dependend on intermediate results, so you can easily divide the task into chunks and distribute it over multiple processes. It's what is called an

    embarrassingly parallel problem.

    The only tricky part here might be, to divide the range into fairly equal parts in the first place. Straight out my personal lib two functions to deal with this:

    # mp_utils.py
    
    from itertools import accumulate
    
    def calc_batch_sizes(n_tasks: int, n_workers: int) -> list:
        """Divide `n_tasks` optimally between n_workers to get batch_sizes.
    
        Guarantees batch sizes won't differ for more than 1.
    
        Example:
        # >>>calc_batch_sizes(23, 4)
        # Out: [6, 6, 6, 5]
    
        In case you're going to use numpy anyway, use np.array_split:
        [len(a) for a in np.array_split(np.arange(23), 4)]
        # Out: [6, 6, 6, 5]
        """
        x = int(n_tasks / n_workers)
        y = n_tasks % n_workers
        batch_sizes = [x + (y > 0)] * y + [x] * (n_workers - y)
    
        return batch_sizes
    
    
    def build_batch_ranges(batch_sizes: list) -> list:
        """Build batch_ranges from list of batch_sizes.
    
        Example:
        # batch_sizes [6, 6, 6, 5]
        # >>>build_batch_ranges(batch_sizes)
        # Out: [range(0, 6), range(6, 12), range(12, 18), range(18, 23)]
        """
        upper_bounds = [*accumulate(batch_sizes)]
        lower_bounds = [0] + upper_bounds[:-1]
        batch_ranges = [range(l, u) for l, u in zip(lower_bounds, upper_bounds)]
    
        return batch_ranges
    

    Then your main script would look like this:

    import time
    from multiprocessing import Pool
    from mp_utils import calc_batch_sizes, build_batch_ranges
    
    
    def target_foo(batch_range):
        return sum(batch_range)  # ~ 6x faster than target_foo1
    
    
    def target_foo1(batch_range):
        numbers = []
        for num in batch_range:
            numbers.append(num)
        return sum(numbers)
    
    
    if __name__ == '__main__':
    
        N = 100000000
        N_CORES = 4
    
        batch_sizes = calc_batch_sizes(N, n_workers=N_CORES)
        batch_ranges = build_batch_ranges(batch_sizes)
    
        start = time.perf_counter()
        with Pool(N_CORES) as pool:
            result = pool.map(target_foo, batch_ranges)
            r_sum = sum(result)
        print(r_sum)
        print(f'elapsed: {time.perf_counter() - start:.2f} s')
    

    Note that I also switched your for-loop for a simple sum over the range object, since it offers much better performance. If you cant do this in your real app, a list comprehension would still be ~60% faster than filling your list manually like in your example.

    Example Output:

    4999999950000000
    elapsed: 0.51 s
    
    Process finished with exit code 0
    

提交回复
热议问题