Generate a list of primes up to a certain number

纵饮孤独 提交于 2019-11-26 20:17:42

This is an implementation of the Sieve of Eratosthenes algorithm in R.

sieve <- function(n)
{
   n <- as.integer(n)
   if(n > 1e6) stop("n too large")
   primes <- rep(TRUE, n)
   primes[1] <- FALSE
   last.prime <- 2L
   for(i in last.prime:floor(sqrt(n)))
   {
      primes[seq.int(2L*last.prime, n, last.prime)] <- FALSE
      last.prime <- last.prime + min(which(primes[(last.prime+1):n]))
   }
   which(primes)
}

 sieve(1000000)
John

That sieve posted by George Dontas is a good starting point. Here's a much faster version with running times for 1e6 primes of 0.095s as opposed to 30s for the original version.

sieve <- function(n)
{
   n <- as.integer(n)
   if(n > 1e8) stop("n too large")
   primes <- rep(TRUE, n)
   primes[1] <- FALSE
   last.prime <- 2L
   fsqr <- floor(sqrt(n))
   while (last.prime <= fsqr)
   {
      primes[seq.int(2L*last.prime, n, last.prime)] <- FALSE
      sel <- which(primes[(last.prime+1):(fsqr+1)])
      if(any(sel)){
        last.prime <- last.prime + min(sel)
      }else last.prime <- fsqr+1
   }
   which(primes)
}

Here are some alternate algorithms below coded about as fast as possible in R. They are slower than the sieve but a heck of a lot faster than the questioners original post.

Here's a recursive function that uses mod but is vectorized. It returns for 1e5 almost instantaneously and 1e6 in under 2s.

primes <- function(n){
    primesR <- function(p, i = 1){
        f <- p %% p[i] == 0 & p != p[i]
        if (any(f)){
            p <- primesR(p[!f], i+1)
        }
        p
    }
    primesR(2:n)
}

The next one isn't recursive and faster again. The code below does primes up to 1e6 in about 1.5s on my machine.

primest <- function(n){
    p <- 2:n
    i <- 1
    while (p[i] <= sqrt(n)) {
        p <-  p[p %% p[i] != 0 | p==p[i]]
        i <- i+1
    }
    p
}

BTW, the spuRs package has a number of prime finding functions including a sieve of E. Haven't checked to see what the speed is like for them.

And while I'm writing a very long answer... here's how you'd check in R if one value is prime.

isPrime <- function(x){
    div <- 2:ceiling(sqrt(x))
    !any(x %% div == 0)
}

Best way that I know of to generate all primes (without getting into crazy math) is to use the Sieve of Eratosthenes.

It is pretty straightforward to implement and allows you calculate primes without using division or modulus. The only downside is that it is memory intensive, but various optimizations can be made to improve memory (ignoring all even numbers for instance).

Prime Numbers in R

The OP asked to generate all prime numbers below one billion. All of the answers provided thus far are either not capable of doing this, will take a long a time to execute, or currently not available in R (see the answer by @Charles). The package RcppAlgos (I am the author) is capable of generating the requested output in just over 1 second using only one thread. It is based off of the segmented sieve of Eratosthenes by Kim Walisch.

RcppAlgos

library(RcppAlgos)
system.time(primeSieve(10^9))  ## using 1 thread
  user  system elapsed 
 1.218   0.088   1.307

Using Multiple Threads

And in recent versions (i.e. >= 2.3.0), we can utilize multiple threads for even faster generation. For example, now we can generate the primes up to 1 billion in under half a second!

system.time(primeSieve(10^9, nThreads = 8))
  user  system elapsed 
 2.239   0.046   0.416

Summary of Available Packages in R for Generating Primes

library(schoolmath)
library(primefactr)
library(sfsmisc)
library(primes)
library(numbers)
library(spuRs)
library(randtoolbox)
library(matlab)
## and 'sieve' from @John

Before we begin, we note that the problems pointed out by @Henrik in schoolmath still exists. Observe:

## 1 is NOT a prime number
schoolmath::primes(start = 1, end = 20)
[1]  1  2  3  5  7 11 13 17 19   

## This should return 1, however it is saying that 52
##  "prime" numbers less than 10^4 are divisible by 7!!
sum(schoolmath::primes(start = 1, end = 10^4) %% 7L == 0)
[1] 52

The point is, don't use schoolmath for generating primes at this point (no offense to the author... In fact, I have filed an issue with the maintainer).

Let's look at randtoolbox as it appears to be incredibly efficient. Observe:

library(microbenchmark)
## the argument for get.primes is for how many prime numbers you need
## whereas most packages get all primes less than a certain number
microbenchmark(priRandtoolbox = get.primes(78498),
              priRcppAlgos = RcppAlgos::primeSieve(10^6), unit = "relative")
Unit: relative
          expr     min       lq     mean  median       uq      max neval
priRandtoolbox  1.0000  1.00000 1.000000 1.00000 1.000000 1.000000   100
  priRcppAlgos 14.0758 14.20469 8.555965 6.99534 7.114415 2.809296   100

A closer look reveals that it is essentially a lookup table (found in the file randtoolbox.c from the source code).

#include "primes.h"

void reconstruct_primes()
{
    int i;
    if (primeNumber[2] == 1)
        for (i = 2; i < 100000; i++)
            primeNumber[i] = primeNumber[i-1] + 2*primeNumber[i];
}

Where primes.h is a header file that contains an array of "halves of differences between prime numbers". Thus, you will be limited by the number of elements in that array for generating primes (i.e. the first one hundred thousand primes). If you are only working with smaller primes (less than 1,299,709 (i.e. the 100,000th prime)) and you are working on a project that requires the nth prime, randtoolbox is the way to go.

Below, we perform benchmarks on the rest of the packages.

Primes up to One Million

microbenchmark(priRcppAlgos = RcppAlgos::primeSieve(10^6),
               priNumbers = numbers::Primes(10^6),
               priSpuRs = spuRs::primesieve(c(), 2:10^6),
               priPrimes = primes::generate_primes(1, 10^6),
               priPrimefactr = primefactr::AllPrimesUpTo(10^6),
               priSfsmisc = sfsmisc::primes(10^6),
               priMatlab = matlab::primes(10^6),
               priJohnSieve = sieve(10^6),
               unit = "relative")
Unit: relative
          expr        min        lq      mean     median        uq       max neval
  priRcppAlgos   1.000000   1.00000   1.00000   1.000000   1.00000  1.000000   100
    priNumbers  19.092499  22.29017  25.79069  22.527344  23.53524 16.439443   100
      priSpuRs 210.980827 204.75970 203.74259 218.533689 218.12819 64.208745   100
     priPrimes  43.572518  40.61982  36.36935  37.234043  37.55404 10.399216   100
 priPrimefactr  35.850982  37.38720  39.47520  35.848167  37.62628 19.540713   100
    priSfsmisc   9.462374  10.54760  10.55800   9.921876  12.25639  3.887074   100
     priMatlab  19.698811  22.70576  25.39655  22.851422  23.63050 15.265014   100
  priJohnSieve  10.149523  10.68950  11.42043  10.437246  12.72949 11.595701   100

Primes up to Ten Million

microbenchmark(priRcppAlgos = RcppAlgos::primeSieve(10^7),
               priNumbers = numbers::Primes(10^7),
               priSpuRs = spuRs::primesieve(c(), 2:10^7),
               priPrimes = primes::generate_primes(1, 10^7),
               priPrimefactr = primefactr::AllPrimesUpTo(10^7),
               priSfsmisc = sfsmisc::primes(10^7),
               priMatlab = matlab::primes(10^7),
               priJohnSieve = sieve(10^7),
               unit = "relative", times = 20)
Unit: relative
          expr       min        lq      mean    median        uq       max neval
  priRcppAlgos   1.00000   1.00000   1.00000   1.00000   1.00000   1.00000    20
    priNumbers  28.39102  27.63922  27.96319  27.34067  25.44119  37.72224    20
      priSpuRs 469.06554 469.09160 445.61612 457.34482 419.91417 398.29053    20
     priPrimes 117.11150 111.35547 107.61258 109.10053 102.32481  97.34148    20
 priPrimefactr  46.13612  47.24150  47.65271  47.34762  46.58394  50.10061    20
    priSfsmisc  17.37116  16.99990  17.64440  16.77242  17.10034  25.25716    20
     priMatlab  27.24177  27.17770  28.79239  27.37511  26.70660  36.79823    20
  priJohnSieve  16.83924  17.43330  18.63179  17.83366  17.24865  28.89491    20

Primes up to One Hundred Million

For the next two benchmarks, we only consider RcppAlgos, numbers, sfsmisc, and the sieve function by @John.

microbenchmark(priRcppAlgos = RcppAlgos::primeSieve(10^8),
               priNumbers = numbers::Primes(10^8),
               priSfsmisc = sfsmisc::primes(10^8),
               priJohnSieve = sieve(10^8),
               unit = "relative", times = 20)
Unit: relative
         expr      min       lq     mean   median       uq      max neval
 priRcppAlgos  1.00000  1.00000  1.00000  1.00000  1.00000  1.00000    20
   priNumbers 31.89653 30.93312 30.73546 30.70144 30.20808 28.79867    20
   priSfsmisc 21.13420 20.14822 19.84391 19.77317 19.40612 18.05891    20
 priJohnSieve 21.39554 20.24689 20.34909 20.24419 20.09711 19.16832    20

Primes up to One Billion

N.B. We must remove the condition if(n > 1e8) stop("n too large") in the sieve function.

## See top section
## system.time(primeSieve(10^9)). 
##  user  system elapsed 
## 1.218   0.088   1.307

## gc()
system.time(numbers::Primes(10^9))
   user  system elapsed 
 32.375  12.129  45.651        ## ~35x slower than RcppAlgos

## gc()
system.time(sieve(10^9))
  user  system elapsed 
26.266   3.906  30.201         ## ~23x slower than RcppAlgos

## gc()
system.time(sfsmisc::primes(10^9))
  user  system elapsed 
24.292   3.389  27.710         ## ~21x slower than RcppAlgos

From these comparison, we see that RcppAlgos scales much better as n gets larger.

 _________________________________________________________
|            |   1e6   |   1e7    |   1e8     |    1e9    |
|            |---------|----------|-----------|-----------
| RcppAlgos  |   1.00  |   1.00   |    1.00   |    1.00   |
|   sfsmisc  |   9.92  |  16.77   |   19.77   |   21.20   |
| JohnSieve  |  10.44  |  17.83   |   20.24   |   23.11   |
|   numbers  |  22.53  |  27.34   |   30.70   |   34.93   |
 ---------------------------------------------------------

Primes Over a Range

microbenchmark(priRcppAlgos = RcppAlgos::primeSieve(10^9, 10^9 + 10^6),
               priNumbers = numbers::Primes(10^9, 10^9 + 10^6),
               priPrimes = primes::generate_primes(10^9, 10^9 + 10^6),
               unit = "relative", times = 20)
Unit: relative
         expr      min       lq    mean   median       uq      max neval
 priRcppAlgos   1.0000   1.0000   1.000   1.0000   1.0000   1.0000    20
   priNumbers 115.3000 112.1195 106.295 110.3327 104.9106  81.6943    20
    priPrimes 983.7902 948.4493 890.243 919.4345 867.5775 708.9603    20

Primes up to 10 billion in Under 6 Seconds

##  primes less than 10 billion
system.time(tenBillion <- RcppAlgos::primeSieve(10^10, nThreads = 8))
  user  system elapsed 
27.319   1.971   5.822

length(tenBillion)
[1] 455052511

## Warning!!!... Large object created
tenBillionSize <- object.size(tenBillion)
print(tenBillionSize, units = "Gb")
3.4 Gb

Primes Over a Range of Very Large Numbers:

Prior to version 2.3.0, we were simply using the same algorithm for numbers of every magnitude. This is okay for smaller numbers when most of the sieving primes have at least one multiple in each segment (Generally, the segment size is limited by the size of L1 Cache ~32KiB). However, when we are dealing with larger numbers, the sieving primes will contain many numbers that will have fewer than one multiple per segment. This situation creates a lot of overhead, as we are performing many worthless checks that pollutes the cache. Thus, we observe much slower generation of primes when the numbers are very large. Observe for version 2.2.0 (See Installing older version of R package):

## Install version 2.2.0
## packageurl <- "http://cran.r-project.org/src/contrib/Archive/RcppAlgos/RcppAlgos_2.2.0.tar.gz"
## install.packages(packageurl, repos=NULL, type="source")

system.time(old <- RcppAlgos::primeSieve(1e15, 1e15 + 1e9))
 user  system elapsed 
7.932   0.134   8.067

And now using the cache friendly improvement originally developed by Tomás Oliveira, we see drastic improvements:

## Reinstall current version from CRAN
## install.packages("RcppAlgos"); library(RcppAlgos)
system.time(cacheFriendly <- primeSieve(1e15, 1e15 + 1e9))
 user  system elapsed 
2.462   0.197   2.660   ## Over 3x faster than older versions

system.time(primeSieve(1e15, 1e15 + 1e9, nThreads = 8))
 user  system elapsed 
5.037   0.806   0.981   ##  Over 8x faster using multiple threads

Take Away

  1. There are many great packages available for generating primes
  2. If you are looking for speed in general, there is no match to RcppAlgos::primeSieve, especially for larger numbers.
  3. If you are working with small primes, look no further than randtoolbox::get.primes.
  4. If you need primes in a range, the packages numbers, primes, & RcppAlgos are the way to go.
  5. The importance of good programming practices cannot be overemphasized (e.g. vectorization, using correct data types, etc.). This is most aptly demonstrated by the pure base R solution provided by @John. It is concise, clear, and very efficient.

This method should be Faster and simpler.

allPrime <- function(n) {
  primes <- rep(TRUE, n)
  primes[1] <- FALSE
  for (i in 1:sqrt(n)) {
    if (primes[i]) primes[seq(i^2, n, i)] <- FALSE
  }
  which(primes)
}

0.12 second on my computer for n = 1e6

I implemented this in function AllPrimesUpTo in package primefactr.

I recommend primegen, Dan Bernstein's implementation of the Atkin-Bernstein sieve. It's very fast and will scale well to other problems. You'll need to pass data out to the program to use it, but I imagine there are ways to do that?

You can also cheat and use the primes() function in the schoolmath package :D

The isPrime() function posted above could use sieve(). One only needs to check if any of the primes < ceiling(sqrt(x)) divide x with no remainder. Need to handle 1 and 2, also.

isPrime <- function(x) {
    div <- sieve(ceiling(sqrt(x)))
    (x > 1) & ((x == 2) | !any(x %% div == 0))
}
for (i in 2:1000) {
a = (2:(i-1))
b = as.matrix(i%%a)
c = colSums(b != 0)
if (c == i-2)
 {
 print(i)
 }
 }

Every number (i) before (a) is checked against the list of prime numbers (n) generated by checking for number (i-1)

Thanks for suggestions:

prime = function(a,n){
    n=c(2)
    i=3
    while(i <=a){
      for(j in n[n<=sqrt(i)]){
        r=0
        if (i%%j == 0){
          r=1}
        if(r==1){break}


      }
      if(r!=1){n = c(n,i)}
      i=i+2
    }
    print(n)
  }
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!