问题
I\'m trying to generate a list of primes below 1 billion. I\'m trying this, but this kind of structure is pretty shitty. Any suggestions?
a <- 1:1000000000
d <- 0
b <- for (i in a) {for (j in 1:i) {if (i %% j !=0) {d <- c(d,i)}}}
回答1:
This is an implementation of the Sieve of Eratosthenes algorithm in R.
sieve <- function(n)
{
n <- as.integer(n)
if(n > 1e6) stop("n too large")
primes <- rep(TRUE, n)
primes[1] <- FALSE
last.prime <- 2L
for(i in last.prime:floor(sqrt(n)))
{
primes[seq.int(2L*last.prime, n, last.prime)] <- FALSE
last.prime <- last.prime + min(which(primes[(last.prime+1):n]))
}
which(primes)
}
sieve(1000000)
回答2:
That sieve posted by George Dontas is a good starting point. Here's a much faster version with running times for 1e6 primes of 0.095s as opposed to 30s for the original version.
sieve <- function(n)
{
n <- as.integer(n)
if(n > 1e8) stop("n too large")
primes <- rep(TRUE, n)
primes[1] <- FALSE
last.prime <- 2L
fsqr <- floor(sqrt(n))
while (last.prime <= fsqr)
{
primes[seq.int(2L*last.prime, n, last.prime)] <- FALSE
sel <- which(primes[(last.prime+1):(fsqr+1)])
if(any(sel)){
last.prime <- last.prime + min(sel)
}else last.prime <- fsqr+1
}
which(primes)
}
Here are some alternate algorithms below coded about as fast as possible in R. They are slower than the sieve but a heck of a lot faster than the questioners original post.
Here's a recursive function that uses mod but is vectorized. It returns for 1e5 almost instantaneously and 1e6 in under 2s.
primes <- function(n){
primesR <- function(p, i = 1){
f <- p %% p[i] == 0 & p != p[i]
if (any(f)){
p <- primesR(p[!f], i+1)
}
p
}
primesR(2:n)
}
The next one isn't recursive and faster again. The code below does primes up to 1e6 in about 1.5s on my machine.
primest <- function(n){
p <- 2:n
i <- 1
while (p[i] <= sqrt(n)) {
p <- p[p %% p[i] != 0 | p==p[i]]
i <- i+1
}
p
}
BTW, the spuRs package has a number of prime finding functions including a sieve of E. Haven't checked to see what the speed is like for them.
And while I'm writing a very long answer... here's how you'd check in R if one value is prime.
isPrime <- function(x){
div <- 2:ceiling(sqrt(x))
!any(x %% div == 0)
}
回答3:
Best way that I know of to generate all primes (without getting into crazy math) is to use the Sieve of Eratosthenes.
It is pretty straightforward to implement and allows you calculate primes without using division or modulus. The only downside is that it is memory intensive, but various optimizations can be made to improve memory (ignoring all even numbers for instance).
回答4:
Prime Numbers in R
The OP asked to generate all prime numbers below one billion. All of the answers provided thus far are either not capable of doing this, will take a long a time to execute, or currently not available in R (see the answer by @Charles). The package RcppAlgos (I am the author) is capable of generating the requested output in just over 1 second using only one thread. It is based off of the segmented sieve of Eratosthenes by Kim Walisch.
RcppAlgos
library(RcppAlgos)
system.time(primeSieve(10^9)) ## using 1 thread
user system elapsed
1.218 0.088 1.307
Using Multiple Threads
And in recent versions (i.e. >= 2.3.0), we can utilize multiple threads for even faster generation. For example, now we can generate the primes up to 1 billion in under half a second!
system.time(primeSieve(10^9, nThreads = 8))
user system elapsed
2.239 0.046 0.416
Summary of Available Packages in R for Generating Primes
library(schoolmath)
library(primefactr)
library(sfsmisc)
library(primes)
library(numbers)
library(spuRs)
library(randtoolbox)
library(matlab)
## and 'sieve' from @John
Before we begin, we note that the problems pointed out by @Henrik in schoolmath still exists. Observe:
## 1 is NOT a prime number
schoolmath::primes(start = 1, end = 20)
[1] 1 2 3 5 7 11 13 17 19
## This should return 1, however it is saying that 52
## "prime" numbers less than 10^4 are divisible by 7!!
sum(schoolmath::primes(start = 1, end = 10^4) %% 7L == 0)
[1] 52
The point is, don't use schoolmath for generating primes at this point (no offense to the author... In fact, I have filed an issue with the maintainer).
Let's look at randtoolbox as it appears to be incredibly efficient. Observe:
library(microbenchmark)
## the argument for get.primes is for how many prime numbers you need
## whereas most packages get all primes less than a certain number
microbenchmark(priRandtoolbox = get.primes(78498),
priRcppAlgos = RcppAlgos::primeSieve(10^6), unit = "relative")
Unit: relative
expr min lq mean median uq max neval
priRandtoolbox 1.0000 1.00000 1.000000 1.00000 1.000000 1.000000 100
priRcppAlgos 14.0758 14.20469 8.555965 6.99534 7.114415 2.809296 100
A closer look reveals that it is essentially a lookup table (found in the file randtoolbox.c from the source code).
#include "primes.h"
void reconstruct_primes()
{
int i;
if (primeNumber[2] == 1)
for (i = 2; i < 100000; i++)
primeNumber[i] = primeNumber[i-1] + 2*primeNumber[i];
}
Where primes.h is a header file that contains an array of "halves of differences between prime numbers". Thus, you will be limited by the number of elements in that array for generating primes (i.e. the first one hundred thousand primes). If you are only working with smaller primes (less than 1,299,709 (i.e. the 100,000th prime)) and you are working on a project that requires the nth prime, randtoolbox is the way to go.
Below, we perform benchmarks on the rest of the packages.
Primes up to One Million
microbenchmark(priRcppAlgos = RcppAlgos::primeSieve(10^6),
priNumbers = numbers::Primes(10^6),
priSpuRs = spuRs::primesieve(c(), 2:10^6),
priPrimes = primes::generate_primes(1, 10^6),
priPrimefactr = primefactr::AllPrimesUpTo(10^6),
priSfsmisc = sfsmisc::primes(10^6),
priMatlab = matlab::primes(10^6),
priJohnSieve = sieve(10^6),
unit = "relative")
Unit: relative
expr min lq mean median uq max neval
priRcppAlgos 1.000000 1.00000 1.00000 1.000000 1.00000 1.000000 100
priNumbers 19.092499 22.29017 25.79069 22.527344 23.53524 16.439443 100
priSpuRs 210.980827 204.75970 203.74259 218.533689 218.12819 64.208745 100
priPrimes 43.572518 40.61982 36.36935 37.234043 37.55404 10.399216 100
priPrimefactr 35.850982 37.38720 39.47520 35.848167 37.62628 19.540713 100
priSfsmisc 9.462374 10.54760 10.55800 9.921876 12.25639 3.887074 100
priMatlab 19.698811 22.70576 25.39655 22.851422 23.63050 15.265014 100
priJohnSieve 10.149523 10.68950 11.42043 10.437246 12.72949 11.595701 100
Primes up to Ten Million
microbenchmark(priRcppAlgos = RcppAlgos::primeSieve(10^7),
priNumbers = numbers::Primes(10^7),
priSpuRs = spuRs::primesieve(c(), 2:10^7),
priPrimes = primes::generate_primes(1, 10^7),
priPrimefactr = primefactr::AllPrimesUpTo(10^7),
priSfsmisc = sfsmisc::primes(10^7),
priMatlab = matlab::primes(10^7),
priJohnSieve = sieve(10^7),
unit = "relative", times = 20)
Unit: relative
expr min lq mean median uq max neval
priRcppAlgos 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 20
priNumbers 28.39102 27.63922 27.96319 27.34067 25.44119 37.72224 20
priSpuRs 469.06554 469.09160 445.61612 457.34482 419.91417 398.29053 20
priPrimes 117.11150 111.35547 107.61258 109.10053 102.32481 97.34148 20
priPrimefactr 46.13612 47.24150 47.65271 47.34762 46.58394 50.10061 20
priSfsmisc 17.37116 16.99990 17.64440 16.77242 17.10034 25.25716 20
priMatlab 27.24177 27.17770 28.79239 27.37511 26.70660 36.79823 20
priJohnSieve 16.83924 17.43330 18.63179 17.83366 17.24865 28.89491 20
Primes up to One Hundred Million
For the next two benchmarks, we only consider RcppAlgos, numbers, sfsmisc, and the sieve function by @John.
microbenchmark(priRcppAlgos = RcppAlgos::primeSieve(10^8),
priNumbers = numbers::Primes(10^8),
priSfsmisc = sfsmisc::primes(10^8),
priJohnSieve = sieve(10^8),
unit = "relative", times = 20)
Unit: relative
expr min lq mean median uq max neval
priRcppAlgos 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 20
priNumbers 31.89653 30.93312 30.73546 30.70144 30.20808 28.79867 20
priSfsmisc 21.13420 20.14822 19.84391 19.77317 19.40612 18.05891 20
priJohnSieve 21.39554 20.24689 20.34909 20.24419 20.09711 19.16832 20
Primes up to One Billion
N.B. We must remove the condition if(n > 1e8) stop("n too large") in the sieve function.
## See top section
## system.time(primeSieve(10^9)).
## user system elapsed
## 1.218 0.088 1.307
## gc()
system.time(numbers::Primes(10^9))
user system elapsed
32.375 12.129 45.651 ## ~35x slower than RcppAlgos
## gc()
system.time(sieve(10^9))
user system elapsed
26.266 3.906 30.201 ## ~23x slower than RcppAlgos
## gc()
system.time(sfsmisc::primes(10^9))
user system elapsed
24.292 3.389 27.710 ## ~21x slower than RcppAlgos
From these comparison, we see that RcppAlgos scales much better as n gets larger.
_________________________________________________________
| | 1e6 | 1e7 | 1e8 | 1e9 |
| |---------|----------|-----------|-----------
| RcppAlgos | 1.00 | 1.00 | 1.00 | 1.00 |
| sfsmisc | 9.92 | 16.77 | 19.77 | 21.20 |
| JohnSieve | 10.44 | 17.83 | 20.24 | 23.11 |
| numbers | 22.53 | 27.34 | 30.70 | 34.93 |
---------------------------------------------------------
Primes Over a Range
microbenchmark(priRcppAlgos = RcppAlgos::primeSieve(10^9, 10^9 + 10^6),
priNumbers = numbers::Primes(10^9, 10^9 + 10^6),
priPrimes = primes::generate_primes(10^9, 10^9 + 10^6),
unit = "relative", times = 20)
Unit: relative
expr min lq mean median uq max neval
priRcppAlgos 1.0000 1.0000 1.000 1.0000 1.0000 1.0000 20
priNumbers 115.3000 112.1195 106.295 110.3327 104.9106 81.6943 20
priPrimes 983.7902 948.4493 890.243 919.4345 867.5775 708.9603 20
Primes up to 10 billion in Under 6 Seconds
## primes less than 10 billion
system.time(tenBillion <- RcppAlgos::primeSieve(10^10, nThreads = 8))
user system elapsed
27.319 1.971 5.822
length(tenBillion)
[1] 455052511
## Warning!!!... Large object created
tenBillionSize <- object.size(tenBillion)
print(tenBillionSize, units = "Gb")
3.4 Gb
Primes Over a Range of Very Large Numbers:
Prior to version 2.3.0, we were simply using the same algorithm for numbers of every magnitude. This is okay for smaller numbers when most of the sieving primes have at least one multiple in each segment (Generally, the segment size is limited by the size of L1 Cache ~32KiB). However, when we are dealing with larger numbers, the sieving primes will contain many numbers that will have fewer than one multiple per segment. This situation creates a lot of overhead, as we are performing many worthless checks that pollutes the cache. Thus, we observe much slower generation of primes when the numbers are very large. Observe for version 2.2.0 (See Installing older version of R package):
## Install version 2.2.0
## packageurl <- "http://cran.r-project.org/src/contrib/Archive/RcppAlgos/RcppAlgos_2.2.0.tar.gz"
## install.packages(packageurl, repos=NULL, type="source")
system.time(old <- RcppAlgos::primeSieve(1e15, 1e15 + 1e9))
user system elapsed
7.932 0.134 8.067
And now using the cache friendly improvement originally developed by Tomás Oliveira, we see drastic improvements:
## Reinstall current version from CRAN
## install.packages("RcppAlgos"); library(RcppAlgos)
system.time(cacheFriendly <- primeSieve(1e15, 1e15 + 1e9))
user system elapsed
2.462 0.197 2.660 ## Over 3x faster than older versions
system.time(primeSieve(1e15, 1e15 + 1e9, nThreads = 8))
user system elapsed
5.037 0.806 0.981 ## Over 8x faster using multiple threads
Take Away
- There are many great packages available for generating primes
- If you are looking for speed in general, there is no match to
RcppAlgos::primeSieve, especially for larger numbers. - If you are working with small primes, look no further than
randtoolbox::get.primes. - If you need primes in a range, the packages
numbers,primes, &RcppAlgosare the way to go. - The importance of good programming practices cannot be overemphasized (e.g. vectorization, using correct data types, etc.). This is most aptly demonstrated by the pure base R solution provided by @John. It is concise, clear, and very efficient.
回答5:
This method should be Faster and simpler.
allPrime <- function(n) {
primes <- rep(TRUE, n)
primes[1] <- FALSE
for (i in 1:sqrt(n)) {
if (primes[i]) primes[seq(i^2, n, i)] <- FALSE
}
which(primes)
}
0.12 second on my computer for n = 1e6
I implemented this in function AllPrimesUpTo in package primefactr.
回答6:
I recommend primegen, Dan Bernstein's implementation of the Atkin-Bernstein sieve. It's very fast and will scale well to other problems. You'll need to pass data out to the program to use it, but I imagine there are ways to do that?
回答7:
You can also cheat and use the primes() function in the schoolmath package :D
回答8:
The isPrime() function posted above could use sieve(). One only needs to check if any of the primes < ceiling(sqrt(x)) divide x with no remainder. Need to handle 1 and 2, also.
isPrime <- function(x) {
div <- sieve(ceiling(sqrt(x)))
(x > 1) & ((x == 2) | !any(x %% div == 0))
}
回答9:
for (i in 2:1000) {
a = (2:(i-1))
b = as.matrix(i%%a)
c = colSums(b != 0)
if (c == i-2)
{
print(i)
}
}
回答10:
Every number (i) before (a) is checked against the list of prime numbers (n) generated by checking for number (i-1)
Thanks for suggestions:
prime = function(a,n){
n=c(2)
i=3
while(i <=a){
for(j in n[n<=sqrt(i)]){
r=0
if (i%%j == 0){
r=1}
if(r==1){break}
}
if(r!=1){n = c(n,i)}
i=i+2
}
print(n)
}
来源:https://stackoverflow.com/questions/3789968/generate-a-list-of-primes-up-to-a-certain-number