Given Prime Number N, Compute the Next Prime?

后端 未结 9 1785
忘掉有多难
忘掉有多难 2020-11-29 17:41

A coworker just told me that the C# Dictionary collection resizes by prime numbers for arcane reasons relating to hashing. And my immediate question was, \"how does it know

9条回答
  •  天涯浪人
    2020-11-29 18:00

    About a year ago I was working this area for libc++ while implementing the unordered (hash) containers for C++11. I thought I would share my experiences here. This experience supports marcog's accepted answer for a reasonable definition of "brute force".

    That means that even a simple brute force will be fast enough in most circumstances, taking O(ln(p)*sqrt(p)) on average.

    I developed several implementations of size_t next_prime(size_t n) where the spec for this function is:

    Returns: The smallest prime that is greater than or equal to n.

    Each implementation of next_prime is accompanied by a helper function is_prime. is_prime should be considered a private implementation detail; not meant to be called directly by the client. Each of these implementations was of course tested for correctness, but also tested with the following performance test:

    int main()
    {
        typedef std::chrono::high_resolution_clock Clock;
        typedef std::chrono::duration ms;
        Clock::time_point t0 = Clock::now();
    
        std::size_t n = 100000000;
        std::size_t e = 100000;
        for (std::size_t i = 0; i < e; ++i)
            n = next_prime(n+1);
    
        Clock::time_point t1 = Clock::now();
        std::cout << e/ms(t1-t0).count() << " primes/millisecond\n";
        return n;
    }
    

    I should stress that this is a performance test, and does not reflect typical usage, which would look more like:

    // Overflow checking not shown for clarity purposes
    n = next_prime(2*n + 1);
    

    All performance tests were compiled with:

    clang++ -stdlib=libc++ -O3 main.cpp
    

    Implementation 1

    There are seven implementations. The purpose for displaying the first implementation is to demonstrate that if you fail to stop testing the candidate prime x for factors at sqrt(x) then you have failed to even achieve an implementation that could be classified as brute force. This implementation is brutally slow.

    bool
    is_prime(std::size_t x)
    {
        if (x < 2)
            return false;
        for (std::size_t i = 2; i < x; ++i)
        {
            if (x % i == 0)
                return false;
        }
        return true;
    }
    
    std::size_t
    next_prime(std::size_t x)
    {
        for (; !is_prime(x); ++x)
            ;
        return x;
    }
    

    For this implementation only I had to set e to 100 instead of 100000, just to get a reasonable running time:

    0.0015282 primes/millisecond
    

    Implementation 2

    This implementation is the slowest of the brute force implementations and the only difference from implementation 1 is that it stops testing for primeness when the factor surpasses sqrt(x).

    bool
    is_prime(std::size_t x)
    {
        if (x < 2)
            return false;
        for (std::size_t i = 2; true; ++i)
        {
            std::size_t q = x / i;
            if (q < i)
                return true;
            if (x % i == 0)
                return false;
        }
        return true;
    }
    
    std::size_t
    next_prime(std::size_t x)
    {
        for (; !is_prime(x); ++x)
            ;
        return x;
    }
    

    Note that sqrt(x) isn't directly computed, but inferred by q < i. This speeds things up by a factor of thousands:

    5.98576 primes/millisecond
    

    and validates marcog's prediction:

    ... this is well within the constraints of most problems taking on the order of a millisecond on most modern hardware.

    Implementation 3

    One can nearly double the speed (at least on the hardware I'm using) by avoiding use of the % operator:

    bool
    is_prime(std::size_t x)
    {
        if (x < 2)
            return false;
        for (std::size_t i = 2; true; ++i)
        {
            std::size_t q = x / i;
            if (q < i)
                return true;
            if (x == q * i)
                return false;
        }
        return true;
    }
    
    std::size_t
    next_prime(std::size_t x)
    {
        for (; !is_prime(x); ++x)
            ;
        return x;
    }
    
    11.0512 primes/millisecond
    

    Implementation 4

    So far I haven't even used the common knowledge that 2 is the only even prime. This implementation incorporates that knowledge, nearly doubling the speed again:

    bool
    is_prime(std::size_t x)
    {
        for (std::size_t i = 3; true; i += 2)
        {
            std::size_t q = x / i;
            if (q < i)
                return true;
            if (x == q * i)
                return false;
        }
        return true;
    }
    
    std::size_t
    next_prime(std::size_t x)
    {
        if (x <= 2)
            return 2;
        if (!(x & 1))
            ++x;
        for (; !is_prime(x); x += 2)
            ;
        return x;
    }
    
    21.9846 primes/millisecond
    

    Implementation 4 is probably what most people have in mind when they think "brute force".

    Implementation 5

    Using the following formula you can easily choose all numbers which are divisible by neither 2 nor 3:

    6 * k + {1, 5}
    

    where k >= 1. The following implementation uses this formula, but implemented with a cute xor trick:

    bool
    is_prime(std::size_t x)
    {
        std::size_t o = 4;
        for (std::size_t i = 5; true; i += o)
        {
            std::size_t q = x / i;
            if (q < i)
                return true;
            if (x == q * i)
                return false;
            o ^= 6;
        }
        return true;
    }
    
    std::size_t
    next_prime(std::size_t x)
    {
        switch (x)
        {
        case 0:
        case 1:
        case 2:
            return 2;
        case 3:
            return 3;
        case 4:
        case 5:
            return 5;
        }
        std::size_t k = x / 6;
        std::size_t i = x - 6 * k;
        std::size_t o = i < 2 ? 1 : 5;
        x = 6 * k + o;
        for (i = (3 + o) / 2; !is_prime(x); x += i)
            i ^= 6;
        return x;
    }
    

    This effectively means that the algorithm has to check only 1/3 of the integers for primeness instead of 1/2 of the numbers and the performance test shows the expected speed up of nearly 50%:

    32.6167 primes/millisecond
    

    Implementation 6

    This implementation is a logical extension of implementation 5: It uses the following formula to compute all numbers which are not divisible by 2, 3 and 5:

    30 * k + {1, 7, 11, 13, 17, 19, 23, 29}
    

    It also unrolls the inner loop within is_prime, and creates a list of "small primes" that is useful for dealing with numbers less than 30.

    static const std::size_t small_primes[] =
    {
        2,
        3,
        5,
        7,
        11,
        13,
        17,
        19,
        23,
        29
    };
    
    static const std::size_t indices[] =
    {
        1,
        7,
        11,
        13,
        17,
        19,
        23,
        29
    };
    
    bool
    is_prime(std::size_t x)
    {
        const size_t N = sizeof(small_primes) / sizeof(small_primes[0]);
        for (std::size_t i = 3; i < N; ++i)
        {
            const std::size_t p = small_primes[i];
            const std::size_t q = x / p;
            if (q < p)
                return true;
            if (x == q * p)
                return false;
        }
        for (std::size_t i = 31; true;)
        {
            std::size_t q = x / i;
            if (q < i)
                return true;
            if (x == q * i)
                return false;
            i += 6;
    
            q = x / i;
            if (q < i)
                return true;
            if (x == q * i)
                return false;
            i += 4;
    
            q = x / i;
            if (q < i)
                return true;
            if (x == q * i)
                return false;
            i += 2;
    
            q = x / i;
            if (q < i)
                return true;
            if (x == q * i)
                return false;
            i += 4;
    
            q = x / i;
            if (q < i)
                return true;
            if (x == q * i)
                return false;
            i += 2;
    
            q = x / i;
            if (q < i)
                return true;
            if (x == q * i)
                return false;
            i += 4;
    
            q = x / i;
            if (q < i)
                return true;
            if (x == q * i)
                return false;
            i += 6;
    
            q = x / i;
            if (q < i)
                return true;
            if (x == q * i)
                return false;
            i += 2;
        }
        return true;
    }
    
    std::size_t
    next_prime(std::size_t n)
    {
        const size_t L = 30;
        const size_t N = sizeof(small_primes) / sizeof(small_primes[0]);
        // If n is small enough, search in small_primes
        if (n <= small_primes[N-1])
            return *std::lower_bound(small_primes, small_primes + N, n);
        // Else n > largest small_primes
        // Start searching list of potential primes: L * k0 + indices[in]
        const size_t M = sizeof(indices) / sizeof(indices[0]);
        // Select first potential prime >= n
        //   Known a-priori n >= L
        size_t k0 = n / L;
        size_t in = std::lower_bound(indices, indices + M, n - k0 * L) - indices;
        n = L * k0 + indices[in];
        while (!is_prime(n))
        {
            if (++in == M)
            {
                ++k0;
                in = 0;
            }
            n = L * k0 + indices[in];
        }
        return n;
    }
    

    This is arguably getting beyond "brute force" and is good for boosting the speed another 27.5% to:

    41.6026 primes/millisecond
    

    Implementation 7

    It is practical to play the above game for one more iteration, developing a formula for numbers not divisible by 2, 3, 5 and 7:

    210 * k + {1, 11, ...},
    

    The source code isn't shown here, but is very similar to implementation 6. This is the implementation I chose to actually use for the unordered containers of libc++, and that source code is open source (found at the link).

    This final iteration is good for another 14.6% speed boost to:

    47.685 primes/millisecond
    

    Use of this algorithm assures that clients of libc++'s hash tables can choose any prime they decide is most beneficial to their situation, and the performance for this application is quite acceptable.

提交回复
热议问题