python prime crunching: processing pool is slower?

依然范特西╮ 提交于 2019-12-04 03:22:36

the most efficient way to use multiprocessing is to divide the work into n equal sized chunks, with n the size of the pool, which should be approximately the number of cores on your system. The reason for this is that the work of starting subprocesses and communicating between them is quite large. If the size of the work is small compared to the number of work chunks, then the overhead of IPC becomes significant.

In your case, you're asking multiprocessing to process each prime individually. A better way to deal with the problem is to pass each worker a range of values, (probably just a start and end value) and have it return all of the primes in that range it found.

In the case of identifying large-ish primes, the work done grows with the starting value, and so You probably don't want to divide the total range into exactly n chunks, but rather n*k equal chunks, with k some reasonable, small number, say 10 - 100. that way, when some workers finish before others, there's more work left to do and it can be balanced efficiently across all workers.

Edit: Here's an improved example to show what that solution might look like. I've changed as little as possible so you can compare apples to apples.

#!/user/bin/python.exe
import math
from multiprocessing import Pool

global primes
primes = set()

def log(result):
    global primes

    if result:
        # since the result is a batch of primes, we have to use 
        # update instead of add (or for a list, extend instead of append)
        primes.update(result)

def isPrime( n ):
    if n < 2:
        return False
    if n == 2:
        return True, n

    max = int(math.ceil(math.sqrt(n)))
    i = 2
    while i <= max:
        if n % i == 0:
            return False
        i += 1
    return True, n

def isPrimeWorker(start, stop):
    """
    find a batch of primes
    """
    primes = set()
    for i in xrange(start, stop):
        if isPrime(i):
            primes.add(i)

    return primes



def main():

    global primes

    pool = Pool()

    # pick an arbitrary chunk size, this will give us 100 different 
    # chunks, but another value might be optimal
    step = 10000

    # use xrange instead of range, we don't actually need a list, just
    # the values in that range.
    for i in xrange(1000000, 2000000, step):
        # call the *worker* function with start and stop values.
        pool.apply_async(isPrimeWorker,(i, i+step,), callback = log)

    pool.close()
    pool.join()

    print sum(primes)

    return

if __name__ == "__main__":
    main()
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!