How to use Criterion to measure performance of Haskell programs?

I'm trying to measure the performance of a simple Haar DWT program using the Criterion framework. (It is erroneously slow, but I'll leave that for another question). I can't find any good documentation on the web, unfortunately. My two primary problems are

How can one pass data from one benchmark to another? I want to time each stage of the program.
How does the sampling work, and avoid lazy evaluation reusing its previous computations?

This source is relatively pared down; the first function getRandList generates a list of random numbers; haarStep transforms an input signal into differences and sums, and haarDWT calls the former and recurses on the sums. I'm trying to pass the getRandList to the haarDWT via lazy evaluation, but perhaps my usage is incorrect / unsupported. The timings don't seem to make sense.

{-# LANGUAGE ViewPatterns #-}

import Control.Arrow
import qualified Data.Vector.Unboxed as V
import System.Random
import Criterion.Main

invSqrt2 = 0.70710678118654752440

getRandList :: RandomGen g => g -> Int -> [Float]
getRandList gen 0 = []
getRandList gen n = v:rest where
    (v, gen') = random gen
    rest = getRandList gen' (n - 1)

haarStep :: V.Vector Float -> (V.Vector Float, V.Vector Float)
haarStep = (alternatingOp (-) &&& alternatingOp (+)) where
    alternatingOp op x = V.generate (V.length x `div` 2) (\i ->
        ((x V.! (2 * i)) `op` (x V.! (2 * i + 1))) * invSqrt2)

haarDWT :: V.Vector Float -> V.Vector Float
haarDWT xl@(V.length -> 1) = xl
haarDWT (haarStep -> (d, s)) = haarDWT s V.++ d

main = do
    gen <- getStdGen
    inData <- return $ getRandList gen 2097152
    outData <- return $ haarDWT (V.fromList inData)

    defaultMain [
        bench "get input" $ nf id inData,
        bench "transform" $ nf V.toList outData
        ]
    writeFile "input.dat" (unlines $ map show inData)
    writeFile "output.dat" (unlines $ map show $ V.toList outData)

Finally, I'm getting an error when I try to call it with -s 1; maybe this is just a Criterion bug.

Main: ./Data/Vector/Generic.hs:237 ((!)): index out of bounds (1,1)

Thanks in advance!

The posted benchmark is erroniously slow... or is it

Are you sure it's erroneous? You're touching (well, the "nf" call is touching) 2 million boxed elements - thats 4 million pointers. You can call this erroneous if you want, but the issue is just what you think you're measure compared to what you really are measuring.

Sharing Data Between Benchmarks

Data sharing can be accomplished through partial application. In my benchmarks I commonly have

let var = somethingCommon in
defaultMain [ bench "one" (nf (func1 somethingCommon) input1)
            , bench "two" (nf (func2 somethingCommon) input2)]

Avoiding Reuse in the presences of lazy evaluation

Criterion avoids sharing by separating out your function and your input. You have signatures such as:

funcToBenchmark :: (NFData b) => a -> b
inputForFunc :: a

In Haskell every time you apply funcToBenchmark inputForFunc it will create a thunk that needs evaluated. There is no sharing unless you use the same variable name as a previous computation. There is no automatic memoization - this seems to be a common misunderstanding.

Notice the nuance in what isn't shared. We aren't sharing the final result, but the input is shared. If the generation of the input is what you want to benchmark (i.e. getRandList, in this case) then benchmark that and not just the identity + nf function:

main = do
    gen <- getStdGen
    let inData = getRandList gen size
        inVec = V.fromList inData
        size = 2097152
    defaultMain
      [ bench "get input for real" $ nf (getRandList gen) size
      , bench "get input for real and run harrDWT and listify a vector" $ nf (V.toList . haarDWT  . V.fromList . getRandList gen) size
      , bench "screw generation, how fast is haarDWT" $ whnf haarDWT inVec] -- for unboxed vectors whnf is sufficient

Interpreting Data

The third benchmark is rather instructive. Lets look at what criterion prints out:

benchmarking screw generation, how fast is haarDWT
collecting 100 samples, 1 iterations each, in estimated 137.3525 s
bootstrapping with 100000 resamples
mean: 134.7204 ms, lb 134.5117 ms, ub 135.0135 ms, ci 0.950

Based on a single run, Criterion thinks it will take 137 seconds to perform it's 100 samples. About ten seconds later it was done - what happened? Well, the first run forced all the inputs (inVec), which was expensive. The subsequent runs found a value instead of a thunk, and thus we truely benchmarked haarDWT and not the StdGen RNG (which is known to be painfully slow).

来源：https://stackoverflow.com/questions/6637968/how-to-use-criterion-to-measure-performance-of-haskell-programs

标签

haskell

performance

criterion