Haskell math performance on multiply-add operation

前端 未结 2 1799
南方客
南方客 2021-01-30 18:05

I\'m writing a game in Haskell, and my current pass at the UI involves a lot of procedural generation of geometry. I am currently focused on identifying performance of one parti

2条回答
  •  情书的邮戳
    2021-01-30 18:18

    Roman Leschinkskiy responds:

    Actually, the core looks mostly ok to me. Using unsafeIndex instead of (!) makes the program more than twice as fast (see my answer above). The program below is much faster, though (and cleaner, IMO). I suspect the remaining difference between this and the C program is due to GHC's general suckiness when it comes to floating point. The HEAD produces the best results with the NCG and -msse2

    First, define a new Vec4 data type:

    {-# LANGUAGE BangPatterns #-}
    
    import Data.Vector.Storable
    import qualified Data.Vector.Storable as V
    import Foreign
    import Foreign.C.Types
    
    -- Define a 4 element vector type
    data Vec4 = Vec4 {-# UNPACK #-} !CFloat
                     {-# UNPACK #-} !CFloat
                     {-# UNPACK #-} !CFloat
                     {-# UNPACK #-} !CFloat
    

    Ensure we can store it in an array

    instance Storable Vec4 where
      sizeOf _ = sizeOf (undefined :: CFloat) * 4
      alignment _ = alignment (undefined :: CFloat)
    
      {-# INLINE peek #-}
      peek p = do
                 a <- peekElemOff q 0
                 b <- peekElemOff q 1
                 c <- peekElemOff q 2
                 d <- peekElemOff q 3
                 return (Vec4 a b c d)
        where
          q = castPtr p
      {-# INLINE poke #-}
      poke p (Vec4 a b c d) = do
                 pokeElemOff q 0 a
                 pokeElemOff q 1 b
                 pokeElemOff q 2 c
                 pokeElemOff q 3 d
        where
          q = castPtr p
    

    Values and methods on this type:

    a = Vec4 0.2 0.1 0.6 1.0
    m = Vec4 0.99 0.7 0.8 0.6
    
    add :: Vec4 -> Vec4 -> Vec4
    {-# INLINE add #-}
    add (Vec4 a b c d) (Vec4 a' b' c' d') = Vec4 (a+a') (b+b') (c+c') (d+d')
    
    mult :: Vec4 -> Vec4 -> Vec4
    {-# INLINE mult #-}
    mult (Vec4 a b c d) (Vec4 a' b' c' d') = Vec4 (a*a') (b*b') (c*c') (d*d')
    
    vsum :: Vec4 -> CFloat
    {-# INLINE vsum #-}
    vsum (Vec4 a b c d) = a+b+c+d
    
    multList :: Int -> Vector Vec4 -> Vector Vec4
    multList !count !src
        | count <= 0    = src
        | otherwise     = multList (count-1) $ V.map (\v -> add (mult v m) a) src
    
    main = do
        print $ Data.Vector.Storable.sum
              $ Data.Vector.Storable.map vsum
              $ multList repCount
              $ Data.Vector.Storable.replicate arraySize (Vec4 0 0 0 0)
    
    repCount, arraySize :: Int
    repCount = 10000
    arraySize = 20000
    

    With ghc 6.12.1, -O2 -fasm:

    • 1.752

    With ghc HEAD (june 26), -O2 -fasm -msse2

    • 1.708

    This looks like the most idiomatic way to write a Vec4 array, and gets the best performance (11x faster than your original). (And this might become a benchmark for GHC's LLVM backend)

提交回复
热议问题