I want to print the first 10000 prime numbers. Can anyone give me the most efficient code for this? Clarifications:
This is an old question, but there's something here everyone's missing...
For primes this small, trial division isn't that slow... there are only 25 primes under 100. With so few primes to test, and such small primes, we can pull out a neat trick!
If a is coprime to b, then gcd a b = 1. Coprime. Fun word. Means it doesn't share any prime factors. We can thus test for divisibility by several primes with one GCD call. How many? Well, the product of the first 15 primes is less than 2^64. And the product of the next 10 is also less than 2^64. That's all 25 that we need. But is it worth it?
Let's see:
check x = null $ filter ((==0) . (x `mod`)) $ []
Prelude> length $ filter check [101,103..85600]
>>> 9975
(0.30 secs, 125,865,152 bytes
a = 16294579238595022365 :: Word64
b = 14290787196698157718
pre = [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
primes = (pre ++) $ filter ((==1) . gcd a) $ filter ((==1) . gcd b) [99,101..85600]
main = print $ length primes
Prelude> main
>>> 10000
(0.05 secs, 36,387,520 bytes)
A 6 fold improvement there.
(length is to force the list to be computed. By default Haskell prints things 1 Unicode character at a time and so actually printing the list will either dominate the time or dominate the amount of actual code used.)
Of course, this is running in GHCi - a repl running interpreted code - on an old laptop and it is not interpreting any of these numbers as int64s or even BigInts, nor will it even if you ask it to (well, you can force it, but it's ugly and doesn't really help). It is interpreting every single number there as generalized Integer-like things that can be specialized to some particular type via dictionary lookup, and it is traversing a linked list (which is not fused away here as it's not compiled) 3 times. Interestingly, hand fusing the two filters actually slows it down in the REPL.
Let's compile it:
...\Haskell\8.6\Testbed>Primes.exe +RTS -s
10000
606,280 bytes allocated in the heap
Total time 0.000s ( 0.004s elapsed)
Using the RTS report because Windows. Some lines trimmed because they aren't relevant - they were other GC data, or measurements of only part of the execution, and together add up to 0.004s (or less). It's also not constant folding, because Haskell doesn't actually do much of that. If we constant fold ourselves (main = print 10000), we get dramatically lower allocation:
...Haskell\8.6\Testbed>Primes.exe +RTS -s
10000
47,688 bytes allocated in the heap
Total time 0.000s ( 0.001s elapsed)
Literally just enough to load the runtime, then discover there's nothing to do but print a number and exit. Let's add wheel factorization:
wheel = scanl (+) 7 $ cycle [4, 2, 4, 2, 4, 6, 2, 6]
primes = (pre ++) $ filter ((==1) . gcd a) $ filter ((==1) . gcd b) $ takeWhile (<85600) wheel
Total time 0.000s ( 0.003s elapsed)
Cut down approximately 1/3rd relative to our reference of main = print 10000, but there's definitely room for more optimization. It actually stopped to perform a GC in there for example, while with tweaking there shouldn't be any heap use. For some reason, compiling for profiling here actually cuts the runtime down to 2 milliseconds:
Tue Nov 12 21:13 2019 Time and Allocation Profiling Report (Final)
Primes.exe +RTS -p -RTS
total time = 0.00 secs (2 ticks @ 1000 us, 1 processor)
total alloc = 967,120 bytes (excludes profiling overheads)
I'm going to leave this as is for now, I'm pretty sure random jitter is starting to dominate.