Always guaranteed evaluation order of `seq` (with strange behavior of `pseq` in addition)

前端未结

关注

 3  1097

既然无缘 2021-02-12 20:44

The documentation of seq function says the following:

A note on evaluation order: the expression seq a b does not guarantee that

3条回答

遥遥无期 (楼主)

2021-02-12 21:44
Edit: My theory foiled as the timings I observed were in fact heavily skewed by the influence of profiling itself; with profiling off, the data goes against the theory. Moreover, the timings vary quite a bit between versions of GHC. I am collecting better observations even now, and I will further edit this answer as I arrive to a conclusive point.

Concerning the question "why pseq is slower", I have a theory.
1. - Let us re-phrase acc' `seq` go acc' xs as strict (go (strict acc') xs).
  - Similarly, acc' `pseq` go acc' xs is re-phrased as lazy (go (strict acc') xs).
2. - Now, let us re-phrase go acc (x:xs) = let ... in ... to go acc (x:xs) = strict (go (x + acc) xs) in the case of seq.
  - And to go acc (x:xs) = lazy (go (x + acc) xs) in the case of pseq.
Now, it is easy to see that, in the case of pseq, go gets assigned a lazy thunk that will be evaluated at some later point. In the definition of sum, go never appears to the left of pseq, and thus, during the run of sum, the evaulation will not at all be forced. Moreover, this happens for every recursive call of go, so thunks accumulate.

This is a theory built from thin air, but I do have a partial proof. Specifically, I did find out that go allocates linear memory in pseq case but not in the case of seq. You may see for yourself if you run the following shell commands:
```
for file in SumNaive.hs SumPseq.hs SumSeq.hs 
do
    stack ghc                \
        --library-profiling  \
        --package parallel   \
        --                   \
        $file                \
        -main-is ${file%.hs} \
        -o ${file%.hs}       \
        -prof                \
        -fprof-auto
done

for file in SumNaive.hs SumSeq.hs SumPseq.hs
do
    time ./${file%.hs} +RTS -P
done
```
-- And compare the memory allocation of the go cost centre.
```
COST CENTRE             ...  ticks     bytes
SumNaive.prof:sum.go    ...    782 559999984
SumPseq.prof:sum.go     ...    669 800000016
SumSeq.prof:sum.go      ...    161         0
```
postscriptum

Since there appears to be discord on the question of which optimizations actually play to what effect, I am putting my exact source code and time measures, so that there is a common baseline.

SumNaive.hs
```
module SumNaive where

import Prelude hiding (sum)

sum :: Num a => [a] -> a
sum = go 0
  where
    go acc []     = acc
    go acc (x:xs) = go (x + acc) xs

main = print $ sum [1..10^7]
```
SumSeq.hs
```
module SumSeq where

import Prelude hiding (sum)

sum :: Num a => [a] -> a
sum = go 0
  where
    go acc []     = acc
    go acc (x:xs) = let acc' = x + acc
                    in acc' `seq` go acc' xs

main = print $ sum [1..10^7]
```
SumPseq.hs
```
module SumPseq where

import Prelude hiding (sum)
import Control.Parallel (pseq)

sum :: Num a => [a] -> a
sum = go 0
  where
    go acc []     = acc
    go acc (x:xs) = let acc' = x + acc
                    in acc' `pseq` go acc' xs

main = print $ sum [1..10^7]
```
Time without optimizations:
```
./SumNaive +RTS -P  4.72s user 0.53s system 99% cpu 5.254 total
./SumSeq +RTS -P  0.84s user 0.00s system 99% cpu 0.843 total
./SumPseq +RTS -P  2.19s user 0.22s system 99% cpu 2.408 total
```
Time with -O:
```
./SumNaive +RTS -P  0.58s user 0.00s system 99% cpu 0.584 total
./SumSeq +RTS -P  0.60s user 0.00s system 99% cpu 0.605 total
./SumPseq +RTS -P  1.91s user 0.24s system 99% cpu 2.147 total
```
Time with -O2:
```
./SumNaive +RTS -P  0.57s user 0.00s system 99% cpu 0.570 total
./SumSeq +RTS -P  0.61s user 0.01s system 99% cpu 0.621 total
./SumPseq +RTS -P  1.92s user 0.22s system 99% cpu 2.137 total
```
It may be seen that:
- Naive variant has poor performance without optimizations, but excellent performance with either -O or -O2 -- to the extent that it outperforms all others.
- seq variant has a good performance that's very little improved by optimizations, so that with either -O or -O2 the Naive variant outperforms it.
- pseq variant has consistently poor performance, about twice better than Naive variant without optimization, and four times worse than others with either -O or -O2. Optimization affects it about as little as the seq variant.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...