How to improve performance of this numerical computation in Haskell?

前端 未结 2 860
时光取名叫无心
时光取名叫无心 2020-12-23 16:25

I\'m in the middle of porting David Blei\'s original C implementation of Latent Dirichlet Allocation to Haskell, and I\'m trying to decide whether to leave some of the low-l

2条回答
  •  天涯浪人
    2020-12-23 16:51

    Use the same control and data structures, yielding:

    {-# LANGUAGE BangPatterns #-}
    {-# OPTIONS_GHC -fvia-C -optc-O3 -fexcess-precision -optc-march=native #-}
    
    {-# INLINE trigamma #-}
    trigamma :: Double -> Double
    trigamma x = go 0 (x' - 1) p'
        where
            x' = x + 6
            p  = 1 / (x' * x')
    
            p' =(((((0.075757575757576*p-0.033333333333333)*p+0.0238095238095238)
                      *p-0.033333333333333)*p+0.166666666666667)*p+1)/x'+0.5*p
    
            go :: Int -> Double -> Double -> Double
            go !i !x !p
                | i >= 6    = p
                | otherwise = go (i+1) (x-1) (1 / (x*x) + p)
    

    I don't have your testsuite, but this yields the following asm:

    A_zdwgo_info:
            cmpq    $5, %r14
            jg      .L3
            movsd   .LC0(%rip), %xmm7
            movapd  %xmm5, %xmm8
            movapd  %xmm7, %xmm9
            mulsd   %xmm5, %xmm8
            leaq    1(%r14), %r14
            divsd   %xmm8, %xmm9
            subsd   %xmm7, %xmm5
            addsd   %xmm9, %xmm6
            jmp     A_zdwgo_info
    

    Which looks ok. This is the kind of code the -fllvm backend does a good job.

    GCC unrolls the loop though, and the only way to do that is either via Template Haskell or manual unrolling. You might consider that (a TH macro) if doing a lot of this.

    Actually, the GHC LLVM backend does unroll the loop :-)

    Finally, if you really like the original Haskell version, write it using stream fusion combinators, and GHC will convert it back into loops. (Exercise for the reader).

提交回复
热议问题