Always guaranteed evaluation order of `seq` (with strange behavior of `pseq` in addition)

前端 未结 3 1097
既然无缘
既然无缘 2021-02-12 20:44

The documentation of seq function says the following:

A note on evaluation order: the expression seq a b does not guarantee that

3条回答
  •  遥遥无期
    2021-02-12 21:44

    Edit: My theory foiled as the timings I observed were in fact heavily skewed by the influence of profiling itself; with profiling off, the data goes against the theory. Moreover, the timings vary quite a bit between versions of GHC. I am collecting better observations even now, and I will further edit this answer as I arrive to a conclusive point.


    Concerning the question "why pseq is slower", I have a theory.

      • Let us re-phrase acc' `seq` go acc' xs as strict (go (strict acc') xs).
      • Similarly, acc' `pseq` go acc' xs is re-phrased as lazy (go (strict acc') xs).
      • Now, let us re-phrase go acc (x:xs) = let ... in ... to go acc (x:xs) = strict (go (x + acc) xs) in the case of seq.
      • And to go acc (x:xs) = lazy (go (x + acc) xs) in the case of pseq.

    Now, it is easy to see that, in the case of pseq, go gets assigned a lazy thunk that will be evaluated at some later point. In the definition of sum, go never appears to the left of pseq, and thus, during the run of sum, the evaulation will not at all be forced. Moreover, this happens for every recursive call of go, so thunks accumulate.

    This is a theory built from thin air, but I do have a partial proof. Specifically, I did find out that go allocates linear memory in pseq case but not in the case of seq. You may see for yourself if you run the following shell commands:

    for file in SumNaive.hs SumPseq.hs SumSeq.hs 
    do
        stack ghc                \
            --library-profiling  \
            --package parallel   \
            --                   \
            $file                \
            -main-is ${file%.hs} \
            -o ${file%.hs}       \
            -prof                \
            -fprof-auto
    done
    
    for file in SumNaive.hs SumSeq.hs SumPseq.hs
    do
        time ./${file%.hs} +RTS -P
    done
    

    -- And compare the memory allocation of the go cost centre.

    COST CENTRE             ...  ticks     bytes
    SumNaive.prof:sum.go    ...    782 559999984
    SumPseq.prof:sum.go     ...    669 800000016
    SumSeq.prof:sum.go      ...    161         0
    

    postscriptum

    Since there appears to be discord on the question of which optimizations actually play to what effect, I am putting my exact source code and time measures, so that there is a common baseline.

    SumNaive.hs

    module SumNaive where
    
    import Prelude hiding (sum)
    
    sum :: Num a => [a] -> a
    sum = go 0
      where
        go acc []     = acc
        go acc (x:xs) = go (x + acc) xs
    
    main = print $ sum [1..10^7]
    

    SumSeq.hs

    module SumSeq where
    
    import Prelude hiding (sum)
    
    sum :: Num a => [a] -> a
    sum = go 0
      where
        go acc []     = acc
        go acc (x:xs) = let acc' = x + acc
                        in acc' `seq` go acc' xs
    
    main = print $ sum [1..10^7]
    

    SumPseq.hs

    module SumPseq where
    
    import Prelude hiding (sum)
    import Control.Parallel (pseq)
    
    sum :: Num a => [a] -> a
    sum = go 0
      where
        go acc []     = acc
        go acc (x:xs) = let acc' = x + acc
                        in acc' `pseq` go acc' xs
    
    main = print $ sum [1..10^7]
    

    Time without optimizations:

    ./SumNaive +RTS -P  4.72s user 0.53s system 99% cpu 5.254 total
    ./SumSeq +RTS -P  0.84s user 0.00s system 99% cpu 0.843 total
    ./SumPseq +RTS -P  2.19s user 0.22s system 99% cpu 2.408 total
    

    Time with -O:

    ./SumNaive +RTS -P  0.58s user 0.00s system 99% cpu 0.584 total
    ./SumSeq +RTS -P  0.60s user 0.00s system 99% cpu 0.605 total
    ./SumPseq +RTS -P  1.91s user 0.24s system 99% cpu 2.147 total
    

    Time with -O2:

    ./SumNaive +RTS -P  0.57s user 0.00s system 99% cpu 0.570 total
    ./SumSeq +RTS -P  0.61s user 0.01s system 99% cpu 0.621 total
    ./SumPseq +RTS -P  1.92s user 0.22s system 99% cpu 2.137 total
    

    It may be seen that:

    • Naive variant has poor performance without optimizations, but excellent performance with either -O or -O2 -- to the extent that it outperforms all others.

    • seq variant has a good performance that's very little improved by optimizations, so that with either -O or -O2 the Naive variant outperforms it.

    • pseq variant has consistently poor performance, about twice better than Naive variant without optimization, and four times worse than others with either -O or -O2. Optimization affects it about as little as the seq variant.

提交回复
热议问题