F# parallelizing issue when calculating perfect numbers?

前端 未结 2 1675
陌清茗
陌清茗 2020-12-20 02:53

I am trying to optimize a small program which calculates perfect numbers from a given exponent.

The program runs (almost) perfectly, but when I open the task manager

相关标签:
2条回答
  • 2020-12-20 03:19

    One quick comment on speed and parallelisability,

    Your isPrime is O(sqrt(n)), and each succesive n is about 2 x as big as the last one, so will take approx 1.5 x as long to calculate, which means that calculating the last numbers will take much longer

    I have done some hacking with testing for primality and some things I have found which are useful are:

    1. For big N, (you are testing numbers with 20 digits), the prime density is actually quite low, so you will be doing alot of divisions by composite numbers. A better approach is to precalculate a table of primes (using a sieve) up to some maximum limit (probably determined by amount of memory). Note that you are most likely to find factors with small numbers. Once you run out of memory for your table, you can test the rest of the numbers with your existing function, with a larger starting point.

    2. Another approach is to use multiple threads in the checking. For example, you currently check x,x+4,x+6... as factors. By being slightly cleverer, you can do the number congruent to 1 mod 3 in 1 thread and the numbers congruent to 2 mod 3 in another thread.

    No. 2 is simplest, but No. 1 is more effective, and provides potential for doing control flow with OutOfMemoryExceptions which can always be interesting

    EDIT: So I implemented both of these ideas, it finds 2305843008139952128 almost instantly, finding 2658455991569831744654692615953842176 takes 7 minutes on my computer (quad core AMD 3200). Most of the time is spent checking 2^61 is prime, so a better algorithm would probably be better for checking the prime numbers: Code here

    let swatch = new System.Diagnostics.Stopwatch()
    swatch.Start()
    let inline PowShift (exp:int32) = 1I <<< exp ;;
    let limit = 10000000 //go to a limit, makes table gen slow, but should pay off
    printfn "making table"
    //returns an array of all the primes up to limit
    let table =
        let table = Array.create limit true //use bools in the table to save on memory
        let tlimit = int (sqrt (float limit)) //max test no for table, ints should be fine
        table.[1] <- false //special case
        [2..tlimit] 
        |> List.iter (fun t -> 
            if table.[t]  then //simple optimisation
                let mutable v = t*2
                while v < limit do
                    table.[v] <- false
                    v <- v + t)
        let out = Array.create (50847534) 0I //wolfram alpha provides pi(1 billion) - want to minimize memory
        let mutable idx = 0
        for x in [1..(limit-1)] do
            if table.[x] then
                out.[idx] <- bigint x
                idx <- idx + 1
        out |> Array.filter (fun t -> t <> 0I) //wolfram no is for 1 billion as limit, we use a smaller number
    printfn "table made"
    
    let rec isploop testprime incr max n=
        if testprime > max then true
        else if n % testprime = 0I then false
        else isploop (testprime + incr) incr max n
    
    let isPrime ( n : bigint) = 
        //first test the table
        let maxFactor = bigint(sqrt(float n))
        match table |> Array.tryFind (fun t -> n % t = 0I && t <= maxFactor) with
        |Some(t) -> 
            false
        |None -> //now slow test
            //I have 4 cores so
            let bases = [|limit;limit+1;limit+3;limit+4|] //uses the fact that 10^x congruent to 1 mod 3
            //for 2 cores, drop last 2 terms above and change 6I to 3I
            match bases |> Array.map (fun t -> async {return isploop (bigint t) 6I maxFactor n}) |> Async.Parallel |> Async.RunSynchronously |> Array.tryFind (fun t -> t = false) with
            |Some(t) -> false
            |None -> true
    
    
    let pcount = ref 0
    let perfectNumbersTwo (n : int) =  
        seq { for i in 2..n do 
               if (isPrime (bigint i)) then
                    if (PowShift i) - 1I |> isPrime then
                        pcount := !pcount + 1
                        if !pcount = 9 then
                            swatch.Stop()
                            printfn "total time %f seconds, %i:%i m:s"  (swatch.Elapsed.TotalSeconds) (swatch.Elapsed.Minutes) (swatch.Elapsed.Seconds)
                        yield PowShift (i-1) * ((PowShift i)-1I)
            } 
    
    
    perfectNumbersTwo 62 |> Seq.iter (printfn "PERFECT: %A") //62 gives 9th number
    
    printfn "done"
    System.Console.Read() |> ignore
    
    0 讨论(0)
  • 2020-12-20 03:36

    @Jeffrey Sax's comment is definitely interesting, so I took some time to do a small experiment. The Lucas-Lehmer test is written as follows:

    let lucasLehmer p =
        let m = (PowShift p) - 1I
        let rec loop i acc =
            if i = p-2 then acc
            else loop (i+1) ((acc*acc - 2I)%m)
        (loop 0 4I) = 0I
    

    With the Lucas-Lehmer test, I can get first few perfect numbers very fast:

    let mersenne (i: int) =     
        if i = 2 || (isPrime (bigint i) && lucasLehmer i) then
            let p = PowShift i
            Some ((p/2I) * (p-1I))
        else None
    
    let runPerfects n =
        seq [1..n]
            |> Seq.choose mersenne
            |> Seq.toArray
    
    let m1 = runPerfects 2048;; // Real: 00:00:07.839, CPU: 00:00:07.878, GC gen0: 112, gen1: 2, gen2: 1
    

    The Lucas-Lehmer test helps to reduce the time checking prime numbers. Instead of testing divisibility of 2^p-1 which takes O(sqrt(2^p-1)), we use the primality test which is at most O(p^3). With n = 2048, I am able to find first 15 Mersenne numbers in 7.83 seconds. The 15th Mersenne number is the one with i = 1279 and it consists of 770 digits.

    I tried to parallelize runPerfects using PSeq module in F# Powerpack. PSeq doesn't preserve the order of the original sequence, so to be fair I have sorted the output sequence. Since the primality test is quite balance among indices, the result is quite encouraging:

    #r "FSharp.Powerpack.Parallel.Seq.dll"    
    open Microsoft.FSharp.Collections
    
    let runPerfectsPar n =
        seq [1..n]
            |> PSeq.choose mersenne
            |> PSeq.sort (* align with sequential version *)
            |> PSeq.toArray 
    
    let m2 = runPerfectsPar 2048;; // Real: 00:00:02.288, CPU: 00:00:07.987, GC gen0: 115, gen1: 1, gen2: 0
    

    With the same input, the parallel version took 2.28 seconds which is equivalent to 3.4x speedup on my quad-core machine. I believe the result could be improved further if you use Parallel.For construct and partition the input range sensibly.

    0 讨论(0)
提交回复
热议问题