Project Euler #14 Tips in Haskell? [closed]

后端未结

关注

 4  1595

Your program is not as slow as you might think…

First of all, your program runs fine and finishes in under two minutes if you compile with -O2 and increase the stack size (I used +RTS -K100m, but your system might vary):

$ .\collatz.exe +RTS -K100m -s
  65,565,993,768 bytes allocated in the heap
  16,662,910,752 bytes copied during GC
      77,042,796 bytes maximum residency (1129 sample(s))
       5,199,140 bytes maximum slop
             184 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     124724 colls,     0 par   18.41s   18.19s     0.0001s    0.0032s
  Gen  1      1129 colls,     0 par   16.67s   16.34s     0.0145s    0.1158s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   39.98s  ( 41.17s elapsed)
  GC      time   35.08s  ( 34.52s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time   75.06s  ( 75.69s elapsed)

  %GC     time      46.7%  (45.6% elapsed)

  Alloc rate    1,639,790,387 bytes per MUT second

  Productivity  53.3% of total user, 52.8% of total elapsed

…but that's still slow

Productivity of ~50% percent means that the GC is using half the time we're staring at the screen, waiting for our result. In our case we create to much garbage by iterating the sequence for every value.

Improvements

The Collatz sequence is a recursive sequence. Therefore, we should define it as a recursive sequence instead of a iterative one and have a look at what happens.

colSeq 1 = [1]
colSeq n 
  | even n    = n : colSeq (n `div` 2) 
  | otherwise = n : colSeq (3 * n + 1)

The list in Haskell is a fundamental type, so GHC should have some nifty optimization (-O2). So lets try this one:

Result

$ .\collatz_rec.exe +RTS -s
  37,491,417,368 bytes allocated in the heap
       4,288,084 bytes copied during GC
          41,860 bytes maximum residency (2 sample(s))
          19,580 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     72068 colls,     0 par    0.22s    0.22s     0.0000s    0.0001s
  Gen  1         2 colls,     0 par    0.00s    0.00s     0.0001s    0.0001s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   32.89s  ( 33.12s elapsed)
  GC      time    0.22s  (  0.22s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time   33.11s  ( 33.33s elapsed)

  %GC     time       0.7%  (0.7% elapsed)

  Alloc rate    1,139,881,573 bytes per MUT second

  Productivity  99.3% of total user, 98.7% of total elapsed

Note that we're now up to 99% productivity in ~80% MUT time (compared to the original version). Just by this small change we decreased the runtime tremendously.

Wait, there's more!

There's a thing that's rather strange. Why are we calculating the length of both 1024 and 512? After all, the later cannot create a longer Collatz sequence.

Improvements

However, in this case we must see the problem as one big task, and not a map. We need to keep track of the values we already calculated, and we want to clear those values we already visited.

We use Data.Set for this:

problem_14 :: S.Set Integer -> [(Integer, Integer)]
problem_14 s
  | S.null s  = []
  | otherwise = (c, fromIntegral $ length csq) : problem_14 rest
  where (c, rest') = S.deleteFindMin s
        csq        = colSeq c
        rest       = rest' `S.difference` S.fromList csq

And we use problem_14 like that:

main = print $ maximumBy (compare `on` snd) $ problem_14 $ S.fromList [1..999999]

Result

$ .\collatz_set.exe +RTS -s
  18,405,282,060 bytes allocated in the heap
   1,645,842,328 bytes copied during GC
      27,446,972 bytes maximum residency (40 sample(s))
         373,056 bytes maximum slop
              79 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     35193 colls,     0 par    2.17s    2.03s     0.0001s    0.0002s
  Gen  1        40 colls,     0 par    0.84s    0.77s     0.0194s    0.0468s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   14.91s  ( 15.17s elapsed)
  GC      time    3.02s  (  2.81s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time   17.92s  ( 17.98s elapsed)

  %GC     time      16.8%  (15.6% elapsed)

  Alloc rate    1,234,735,903 bytes per MUT second

  Productivity  83.2% of total user, 82.9% of total elapsed

We loose some productivity, but that's reasonable. After all, we're now using Set and not the list anymore and use 79MB instead of 1MB. However, our program now runs in 17s instead of 34s, that's only 25% of the original time.

Using `ST`

Inspiration (C++)

int main(){
  std::vector<bool> Q(1000000,true);

  unsigned long long max_l = 0, max_c = 1;

  for(unsigned long i = 1; i < Q.size(); ++i){
    if(!Q[i])
      continue;
    unsigned long long c = i, l = 0;
    while(c != 1){
      if(c < Q.size()) Q[c] = false;
      c = c % 2 == 0 ? c / 2 : 3 * c + 1;
      l++;
    }
    if(l > max_l){
      max_l = l;
      max_c = i;
    }
  }
  std::cout << max_c << std::endl;
}

This program runs in 130ms. Our yet best version needs 100 times more. We can fix that.

Haskell

problem_14_vector_st :: Int -> (Int, Int)
problem_14_vector_st limit = 
  runST $ do
    q <- V.replicate (limit+1) True    
    best <- newSTRef (1,1)
    forM_ [1..limit] $ \i -> do
      b <- V.read q i
      when b $ do
        let csq = colSeq $ fromIntegral i
        let l   = fromIntegral $ length csq
        forM_ (map fromIntegral csq) $ \j-> 
          when (j<= limit && j>= 0) $  V.write q j False
        m <- fmap snd $ readSTRef best
        when (l > m) $ writeSTRef best (i,l)
    readSTRef best

Result

$ collatz_vector_st.exe +RTS -s
   2,762,282,216 bytes allocated in the heap
      10,021,016 bytes copied during GC
       1,026,580 bytes maximum residency (2 sample(s))
          21,684 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      5286 colls,     0 par    0.02s    0.02s     0.0000s    0.0000s
  Gen  1         2 colls,     0 par    0.00s    0.00s     0.0001s    0.0001s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    3.09s  (  3.08s elapsed)
  GC      time    0.02s  (  0.02s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    3.11s  (  3.11s elapsed)

  %GC     time       0.5%  (0.7% elapsed)

  Alloc rate    892,858,898 bytes per MUT second

  Productivity  99.5% of total user, 99.6% of total elapsed

~3 seconds. Someone else might know more tricks, but that's the most I could squeeze out of Haskell.

0 讨论(0)

清酒与你

2020-12-21 18:58

The main source of time and memory issues is that you build the whole Collatz sequences, whereas for the task you only need their lengths, and unfortunately the laziness doesn't save the day. The simple solution calculating only lengths runs in a few seconds:

simpleCol :: Integer -> Int
simpleCol 1 = 1
simpleCol x | even x = 1 + simpleCol (x `quot` 2)
            | otherwise = 1 + simpleCol (3 * x + 1)

problem14 = maximum $ map simpleCol [1 .. 999999]

It also takes much less memory and doesn't need enlarged stack:

$> ./simpleCollatz +RTS -s
simpleCollatz +RTS -s
   2,517,321,124 bytes allocated in the heap
         217,468 bytes copied during GC
          41,860 bytes maximum residency (2 sample(s))
          19,580 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      4804 colls,     0 par    0.00s    0.02s     0.0000s    0.0046s
  Gen  1         2 colls,     0 par    0.00s    0.00s     0.0001s    0.0001s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    4.47s  (  4.49s elapsed)
  GC      time    0.00s  (  0.02s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    4.47s  (  4.52s elapsed)

  %GC     time       0.0%  (0.5% elapsed)

  Alloc rate    563,316,615 bytes per MUT second

  Productivity 100.0% of total user, 98.9% of total elapsed

To illustrate the proposed solution with caching, there is a nifty technique called memoization. Arguably the easiest way to use it is to install a memoize package:

import Data.Function.Memoize

memoCol :: Integer -> Int
memoCol = memoFix mc where
    mc _ 1 = 1
    mc f x | even x = 1 + f (x `quot` 2)
           | otherwise = 1 + f (3 * x + 1)

This cuts down the both the runtime and memory usage, but also heavily uses GC in order to maintain cached values:

$> ./memoCollatz +RTS -s
memoCollatz +RTS -s
   1,577,954,668 bytes allocated in the heap
   1,056,591,780 bytes copied during GC
     303,942,300 bytes maximum residency (12 sample(s))
         341,468 bytes maximum slop
             616 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      3003 colls,     0 par    1.11s    1.19s     0.0004s    0.0010s
  Gen  1        12 colls,     0 par    3.48s    3.65s     0.3043s    1.7065s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    7.55s  (  7.50s elapsed)
  GC      time    4.59s  (  4.84s elapsed)
  EXIT    time    0.00s  (  0.05s elapsed)
  Total   time   12.14s  ( 12.39s elapsed)

  %GC     time      37.8%  (39.1% elapsed)

  Alloc rate    209,087,160 bytes per MUT second

  Productivity  62.2% of total user, 60.9% of total elapsed

0 讨论(0)

你的背包

2020-12-21 19:04
Caching the value of integers you've already hit will save you a lot of time. If you toss in the number 1234, and find that takes 273 steps to get to 1, associate the values. 1234->273.

Now if you ever hit 1234 while in a sequence, you don't have to take 273 more steps to find the answer, just add 273 to your current number and you know the length of the sequence.

Do this for every number you calculate, even the ones in the middle of a sequence. For example, if you are at 1234 and you don't have a value yet, do the step (divide by 2) and calculate and cache the value for 617. You cache almost all the important values really quick this way. There are some really long chains that you'll end up on again and again.

The easiest way to cache all the values as you go is to make a recursive function. Like this (in pseudo-code):
```
function collatz(number) {
    if number is 1: return 1
    else if number is in cache: return cached value
    else perform step: newnumber = div 2 if even, time 3 + 1 if odd
    steps = collatz(newnumber) + 1 //+1 for the step we just took
    cache steps as the result for number
    return steps
}
```
Hopefully Haskell won't have problems with the depths of recursion that you'll end up in like this. However, it haskell doesn't like it, you can implement the same thing with a stack, it is just less intuitive.
0 讨论(0)
发布评论:

提交评论
- 加载中...
再見小時候

2020-12-21 19:07
Make sure you use Integer instead of Int beacuse of Int32 overflow that makes recursion issues.
```
collatz :: Integer -> Integer
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题