How to use Criterion to measure performance of Haskell programs?

走远了吗. 提交于 2019-11-30 04:55:53

The posted benchmark is erroniously slow... or is it

Are you sure it's erroneous? You're touching (well, the "nf" call is touching) 2 million boxed elements - thats 4 million pointers. You can call this erroneous if you want, but the issue is just what you think you're measure compared to what you really are measuring.

Sharing Data Between Benchmarks

Data sharing can be accomplished through partial application. In my benchmarks I commonly have

let var = somethingCommon in
defaultMain [ bench "one" (nf (func1 somethingCommon) input1)
            , bench "two" (nf (func2 somethingCommon) input2)]

Avoiding Reuse in the presences of lazy evaluation

Criterion avoids sharing by separating out your function and your input. You have signatures such as:

funcToBenchmark :: (NFData b) => a -> b
inputForFunc :: a

In Haskell every time you apply funcToBenchmark inputForFunc it will create a thunk that needs evaluated. There is no sharing unless you use the same variable name as a previous computation. There is no automatic memoization - this seems to be a common misunderstanding.

Notice the nuance in what isn't shared. We aren't sharing the final result, but the input is shared. If the generation of the input is what you want to benchmark (i.e. getRandList, in this case) then benchmark that and not just the identity + nf function:

main = do
    gen <- getStdGen
    let inData = getRandList gen size
        inVec = V.fromList inData
        size = 2097152
    defaultMain
      [ bench "get input for real" $ nf (getRandList gen) size
      , bench "get input for real and run harrDWT and listify a vector" $ nf (V.toList . haarDWT  . V.fromList . getRandList gen) size
      , bench "screw generation, how fast is haarDWT" $ whnf haarDWT inVec] -- for unboxed vectors whnf is sufficient

Interpreting Data

The third benchmark is rather instructive. Lets look at what criterion prints out:

benchmarking screw generation, how fast is haarDWT
collecting 100 samples, 1 iterations each, in estimated 137.3525 s
bootstrapping with 100000 resamples
mean: 134.7204 ms, lb 134.5117 ms, ub 135.0135 ms, ci 0.950

Based on a single run, Criterion thinks it will take 137 seconds to perform it's 100 samples. About ten seconds later it was done - what happened? Well, the first run forced all the inputs (inVec), which was expensive. The subsequent runs found a value instead of a thunk, and thus we truely benchmarked haarDWT and not the StdGen RNG (which is known to be painfully slow).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!