How to parallel sum a loop using multiprocessing in Python

前端 未结 3 1245
甜味超标
甜味超标 2020-12-18 01:10

I am having difficulty understanding how to use Python\'s multiprocessing module.

I have a sum from 1 to n where n=10^10, whic

3条回答
  •  谎友^
    谎友^ (楼主)
    2020-12-18 01:27

    You can do this sum without multiprocessing at all, and it's probably simpler, if not faster, to just use generators.

    # prepare a generator of generators each at 1000 point intervals
    >>> xr = (xrange(1000*i+1,i*1000+1001) for i in xrange(10000000))
    >>> list(xr)[:3]
    [xrange(1, 1001), xrange(1001, 2001), xrange(2001, 3001)]
    # sum, using two map functions
    >>> xr = (xrange(1000*i+1,i*1000+1001) for i in xrange(10000000))
    >>> sum(map(sum, map(lambda x:x, xr)))
    50000000005000000000L
    

    However, if you want to use multiprocessing, you can also do this too. I'm using a fork of multiprocessing that is better at serialization (but otherwise, not really different).

    >>> xr = (xrange(1000*i+1,i*1000+1001) for i in xrange(10000000))
    >>> import pathos
    >>> mmap = pathos.multiprocessing.ProcessingPool().map
    >>> tmap = pathos.multiprocessing.ThreadingPool().map
    >>> sum(tmap(sum, mmap(lambda x:x, xr)))
    50000000005000000000L
    

    The version w/o multiprocessing is faster and takes about a minute on my laptop. The multiprocessing version takes a few minutes due to the overhead of spawning multiple python processes.

    If you are interested, get pathos here: https://github.com/uqfoundation

提交回复
热议问题