F# parallelizing issue when calculating perfect numbers?

前端未结

关注

 2  1681

陌清茗

I am trying to optimize a small program which calculates perfect numbers from a given exponent.

The program runs (almost) perfectly, but when I open the task manager

相关标签:

2条回答

遇见更好的自我

2020-12-20 03:19

One quick comment on speed and parallelisability,

Your isPrime is O(sqrt(n)), and each succesive n is about 2 x as big as the last one, so will take approx 1.5 x as long to calculate, which means that calculating the last numbers will take much longer

I have done some hacking with testing for primality and some things I have found which are useful are:

For big N, (you are testing numbers with 20 digits), the prime density is actually quite low, so you will be doing alot of divisions by composite numbers. A better approach is to precalculate a table of primes (using a sieve) up to some maximum limit (probably determined by amount of memory). Note that you are most likely to find factors with small numbers. Once you run out of memory for your table, you can test the rest of the numbers with your existing function, with a larger starting point.
Another approach is to use multiple threads in the checking. For example, you currently check x,x+4,x+6... as factors. By being slightly cleverer, you can do the number congruent to 1 mod 3 in 1 thread and the numbers congruent to 2 mod 3 in another thread.

No. 2 is simplest, but No. 1 is more effective, and provides potential for doing control flow with OutOfMemoryExceptions which can always be interesting

EDIT: So I implemented both of these ideas, it finds 2305843008139952128 almost instantly, finding 2658455991569831744654692615953842176 takes 7 minutes on my computer (quad core AMD 3200). Most of the time is spent checking 2^61 is prime, so a better algorithm would probably be better for checking the prime numbers: Code here

let swatch = new System.Diagnostics.Stopwatch()
swatch.Start()
let inline PowShift (exp:int32) = 1I <<< exp ;;
let limit = 10000000 //go to a limit, makes table gen slow, but should pay off
printfn "making table"
//returns an array of all the primes up to limit
let table =
    let table = Array.create limit true //use bools in the table to save on memory
    let tlimit = int (sqrt (float limit)) //max test no for table, ints should be fine
    table.[1] <- false //special case
    [2..tlimit] 
    |> List.iter (fun t -> 
        if table.[t]  then //simple optimisation
            let mutable v = t*2
            while v < limit do
                table.[v] <- false
                v <- v + t)
    let out = Array.create (50847534) 0I //wolfram alpha provides pi(1 billion) - want to minimize memory
    let mutable idx = 0
    for x in [1..(limit-1)] do
        if table.[x] then
            out.[idx] <- bigint x
            idx <- idx + 1
    out |> Array.filter (fun t -> t <> 0I) //wolfram no is for 1 billion as limit, we use a smaller number
printfn "table made"

let rec isploop testprime incr max n=
    if testprime > max then true
    else if n % testprime = 0I then false
    else isploop (testprime + incr) incr max n

let isPrime ( n : bigint) = 
    //first test the table
    let maxFactor = bigint(sqrt(float n))
    match table |> Array.tryFind (fun t -> n % t = 0I && t <= maxFactor) with
    |Some(t) -> 
        false
    |None -> //now slow test
        //I have 4 cores so
        let bases = [|limit;limit+1;limit+3;limit+4|] //uses the fact that 10^x congruent to 1 mod 3
        //for 2 cores, drop last 2 terms above and change 6I to 3I
        match bases |> Array.map (fun t -> async {return isploop (bigint t) 6I maxFactor n}) |> Async.Parallel |> Async.RunSynchronously |> Array.tryFind (fun t -> t = false) with
        |Some(t) -> false
        |None -> true


let pcount = ref 0
let perfectNumbersTwo (n : int) =  
    seq { for i in 2..n do 
           if (isPrime (bigint i)) then
                if (PowShift i) - 1I |> isPrime then
                    pcount := !pcount + 1
                    if !pcount = 9 then
                        swatch.Stop()
                        printfn "total time %f seconds, %i:%i m:s"  (swatch.Elapsed.TotalSeconds) (swatch.Elapsed.Minutes) (swatch.Elapsed.Seconds)
                    yield PowShift (i-1) * ((PowShift i)-1I)
        } 


perfectNumbersTwo 62 |> Seq.iter (printfn "PERFECT: %A") //62 gives 9th number

printfn "done"
System.Console.Read() |> ignore

0 讨论(0)

礼貌的吻别

2020-12-20 03:36
@Jeffrey Sax's comment is definitely interesting, so I took some time to do a small experiment. The Lucas-Lehmer test is written as follows:
```
let lucasLehmer p =
    let m = (PowShift p) - 1I
    let rec loop i acc =
        if i = p-2 then acc
        else loop (i+1) ((acc*acc - 2I)%m)
    (loop 0 4I) = 0I
```
With the Lucas-Lehmer test, I can get first few perfect numbers very fast:
```
let mersenne (i: int) =     
    if i = 2 || (isPrime (bigint i) && lucasLehmer i) then
        let p = PowShift i
        Some ((p/2I) * (p-1I))
    else None

let runPerfects n =
    seq [1..n]
        |> Seq.choose mersenne
        |> Seq.toArray

let m1 = runPerfects 2048;; // Real: 00:00:07.839, CPU: 00:00:07.878, GC gen0: 112, gen1: 2, gen2: 1
```
The Lucas-Lehmer test helps to reduce the time checking prime numbers. Instead of testing divisibility of 2^p-1 which takes O(sqrt(2^p-1)), we use the primality test which is at most O(p^3). With n = 2048, I am able to find first 15 Mersenne numbers in 7.83 seconds. The 15th Mersenne number is the one with i = 1279 and it consists of 770 digits.

I tried to parallelize runPerfects using PSeq module in F# Powerpack. PSeq doesn't preserve the order of the original sequence, so to be fair I have sorted the output sequence. Since the primality test is quite balance among indices, the result is quite encouraging:
```
#r "FSharp.Powerpack.Parallel.Seq.dll"    
open Microsoft.FSharp.Collections

let runPerfectsPar n =
    seq [1..n]
        |> PSeq.choose mersenne
        |> PSeq.sort (* align with sequential version *)
        |> PSeq.toArray 

let m2 = runPerfectsPar 2048;; // Real: 00:00:02.288, CPU: 00:00:07.987, GC gen0: 115, gen1: 1, gen2: 0
```
With the same input, the parallel version took 2.28 seconds which is equivalent to 3.4x speedup on my quad-core machine. I believe the result could be improved further if you use Parallel.For construct and partition the input range sensibly.
0 讨论(0)
发布评论:

提交评论
- 加载中...