Sieve of Eratosthenes - Primes between X and N

[亡魂溺海] 提交于 2019-11-26 17:52:43

问题


I found this highly optimised implementation of the Sieve of Eratosthenes for Python on Stack Overflow. I have a rough idea of what it's doing but I must admit the details of it's workings elude me.

I would still like to use it for a little project (I'm aware there are libraries to do this but I would like to use this function).

Here's the original:

'''
    Sieve of Eratosthenes 
    Implementation by Robert William Hanks      
    https://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n/3035188
'''

def sieve(n):
    """Return an array of the primes below n."""
    prime = numpy.ones(n//3 + (n%6==2), dtype=numpy.bool)
    for i in range(3, int(n**.5) + 1, 3):
        if prime[i // 3]:
            p = (i + 1) | 1
            prime[       p*p//3     ::2*p] = False
            prime[p*(p-2*(i&1)+4)//3::2*p] = False
    result = (3 * prime.nonzero()[0] + 1) | 1
    result[0] = 3
    return numpy.r_[2,result]

What I'm trying to achieve is to modify it to return all primes below n starting at x so that:

primes = sieve(50, 100)

would return primes between 50 and 100. This seemed easy enough, I tried replacing these two lines:

def sieve(x, n):
    ...
    for i in range(x, int(n**.5) + 1, 3):
    ...

But for a reason I can't explain, the value of x in the above has no influence on the numpy array returned!

How can I modify sieve() to only return primes between x and n


回答1:


The implementation you've borrowed is able to start at 3 because it replaces sieving out the multiples of 2 by just skipping all even numbers; that's what the 2*… that appear multiple times in the code are about. The fact that 3 is the next prime is also hardcoded in all over the place, but let's ignore that for the moment, because if you can't get past the special-casing of 2, the special-casing of 3 doesn't matter.

Skipping even numbers is a special case of a "wheel". You can skip sieving multiples of 2 by always incrementing by 2; you can skip sieving multiples of 2 and 3 by alternately incrementing by 2 and 4; you can skip sieving multiples of 2, 3, 5, and 7 by alternately incrementing by 2, 4, 2, 4, 6, 2, 6, … (there's 48 numbers in the sequence), and so on. So, you could extend this code by first finding all the primes up to x, then building a wheel, then using that wheel to find all the primes between x and n.

But that's adding a lot of complexity. And once you get too far beyond 7, the cost (both in time, and in space for storing the wheel) swamps the savings. And if your whole goal is not to find the primes before x, finding the primes before x so you don't have to find them seems kind of silly. :)

The simpler thing to do is just find all the primes up to n, and throw out the ones below x. Which you can do with a trivial change at the end:

primes = numpy.r_[2,result]
return primes[primes>=x]

Or course there are ways to do this without wasting storage for those initial primes you're going to throw away. They'd be a bit complicated to work into this algorithm (you'd probably want to build the array in sections, then drop each section that's entirely < x as you go, then stack all the remaining sections); it would be far easier to use a different implementation of the algorithm that isn't designed for speed and simplicity over space…

And of course there are different prime-finding algorithms that don't require enumerating all the primes up to x in the first place. But if you want to use this implementation of this algorithm, that doesn't matter.




回答2:


Since you're now interested in looking into other algorithms or other implementations, try this one. It doesn't use numpy, but it is rather fast. I've tried a few variations on this theme, including using sets, and pre-computing a table of low primes, but they were all slower than this one.

#! /usr/bin/env python

''' Prime range sieve.

    Written by PM 2Ring 2014.10.15

    For range(0, 30000000) this is actually _faster_ than the 
    plain Eratosthenes sieve in sieve3.py !!!
'''

import sys

def potential_primes():
    ''' Make a generator for 2, 3, 5, & thence all numbers coprime to 30 '''
    s = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29)
    for i in s:
        yield i
    s = (1,) + s[3:] 
    j = 30
    while True:
        for i in s:
            yield j + i
        j += 30


def range_sieve(lo, hi):
    ''' Create a list of all primes in the range(lo, hi) '''

    #Mark all numbers as prime
    primes = [True] * (hi - lo)

    #Eliminate 0 and 1, if necessary
    for i in range(lo, min(2, hi)):
        primes[i - lo] = False

    ihi = int(hi ** 0.5)
    for i in potential_primes():
        if i > ihi: 
            break

        #Find first multiple of i: i >= i*i and i >= lo
        ilo = max(i, 1 + (lo - 1) // i ) * i

        #Determine how many multiples of i >= ilo are in range
        n = 1 + (hi - ilo - 1) // i

        #Mark them as composite
        primes[ilo - lo : : i] = n * [False]

    return [i for i,v in enumerate(primes, lo) if v]


def main():
    lo = int(sys.argv[1]) if len(sys.argv) > 1 else 0
    hi = int(sys.argv[2]) if len(sys.argv) > 2 else lo + 30
    #print lo, hi

    primes = range_sieve(lo, hi)
    #print len(primes)
    print primes
    #print primes[:10], primes[-10:]


if __name__ == '__main__':
    main()

And here's a link to the plain Eratosthenes sieve that I mentioned in the docstring, in case you want to compare this program to that one.

You could improve this slightly by getting rid of the loop under #Eliminate 0 and 1, if necessary. And I guess it might be slightly faster if you avoided looking at even numbers; it'd certainly use less memory. But then you'd have to handle the cases when 2 was inside the range, and I figure that the less tests you have the faster this thing will run.


Here's a minor improvement to that code: replace

    #Mark all numbers as prime
    primes = [True] * (hi - lo)

    #Eliminate 0 and 1, if necessary
    for i in range(lo, min(2, hi)):
        primes[i - lo] = False

with

    #Eliminate 0 and 1, if necessary
    lo = max(2, lo)

    #Mark all numbers as prime
    primes = [True] * (hi - lo)

However, the original form may be preferable if you want to return the plain bool list rather than performing the enumerate to build a list of integers: the bool list is more useful for testing if a given number is prime; OTOH, the enumerate could be used to build a set rather than a list.



来源:https://stackoverflow.com/questions/26351209/sieve-of-eratosthenes-primes-between-x-and-n

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!