Mutable, random-access array/vector with high performance in haskell

 ̄綄美尐妖づ 提交于 2021-02-08 02:00:40

问题


This a topic on Haskell discussed a lot (e.g. mutable-array-implementation), but I am still not sure what is the best practice for the case requiring frequent modification and random-access of an array/vector.

Say a vector of length 1,000,000. Operation on it involves accessing a (small, e.g 1000) subset of it based on input, and modifying the values based on the input. Furthermore, such operation are repeated 2,000,000 times. The task itself can be implemented in pure data structure such as list as the following for example, though very inefficient:

type Vect = [Int]

f :: Vect -> [[Int]] -> Vect
f x indsList = foldl g x indsList

-- g is just an example of random-access and modifications on the values.
g :: Vect -> [Int] -> Vect
g x inds = map h $ zip x [0..]
    where h (x, i) = if i `elem` inds then x !! i + 1 else x !! i

Hash/Map data structure (e.g. IntMap) could be used for efficient large amounts of random-accesses, but array/vector should do it too. More importantly, the large amount of modifications is still need to be addressed by a mutable structure to avoid memory replication. Is there a mutable, random-acesss array/vector in Haskell indeed? If ST/IO Monads used, does such controls affect performance in my settings?


回答1:


Haskell does have efficient mutable arrays.

  • There is STUArray, which has the quite sophisticated but often just unnecessary Ix-indexing methodology with many bounds-checks and little special optimisation, which makes it a bit slower than theoretically possible.

  • All Data.Vector have very little overhead, make heavy use of stream fusion optimisation, prefer a simple, "list-like" interface. That means you can actually transfer your example very easily directly to immutable vectors, and may still get better performance than you might expect:

    import Data.Vector.Unboxed as VU
    
    type Vect = VU.Vector Int
    
    f :: Vect -> [[Int]] -> Vect
    f x indsList = VU.foldl g x indsList
    
    
    g :: Vect -> [Int] -> Vect
    g x inds = VU.zipWith h x [0..]
        -- h is just an example of modifications on the values.
        where h x i
               | i`elem`inds   = x VU.! i + 1
               | otherwise     = x VU.! i
    

Yes, you probably want to work in the ST monad for mutable updates. Not sure what you mean by "does such controls affect performance": there isn't really any "control" that's not also present in imperative languages, once the compiler has optimised away the proven type-safety. Which GHC can do quite well; you can get pretty close to C performance with Data.Vector.Unboxed. There is always some inevitable overhead, but that has mostly to do with garbage-collection etc. issues, which you would also get in Java.

Since ST and IO are monads, the compiler can in fact do some more high-level optimisations that wouldn't be possible in an imperative language, though compiler's aren't really that far yet.

Performance, particularly of array operations, is discussed in many places, for instance in RWH.




回答2:


Foreign UArrays from Yarr are mutable, random-access and maximum fast. Also they are "quick and dirty", i. e. don't impose freezing/thawing boilerplate for every mutation operation.

Disadvantage: almost all "low-level" operations are under IO.



来源:https://stackoverflow.com/questions/17892065/mutable-random-access-array-vector-with-high-performance-in-haskell

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!