Can a thunk be duplicated to improve memory performance?

三世轮回 提交于 2019-12-18 12:08:10

问题


One of my struggles with lazy evaluation in Haskell is the difficulty of reasoning about memory usage. I think the ability to duplicate a thunk would make this much easier for me. Here's an example.

Let's create a really big list:

let xs = [1..10000000]

Now, let's create a bad function:

bad = do
    print $ foldl1' (+) xs
    print $ length xs

With no optimizations, this eats up a few dozen MB of ram. The garbage collector can't deallocate xs during the fold because it will be needed for calculating the length later.

Is it possible to reimplement this function something like this:

good = do
    (xs1,xs2) <- copyThunk xs
    print $ foldl1' (+) xs1
    print $ length xs2

Now, xs1 and xs2 would represent the same value, but also be independent of each other in memory so the garbage collector can deallocate during the fold preventing memory wasting. (I think this would slightly increase the computational cost though?)

Obviously in this trivial example, refactoring the code could easily solve this problem, but It seems like it's not always obvious how to refactor. Or sometimes refactoring would greatly reduce code clarity.


回答1:


I was wondering the same thing a while ago and created a prototypical implementation of such a thunk-duplication function. You can read about the result in my preprint „dup – Explicit un-sharing in haskell” and see the code at http://darcs.nomeata.de/ghc-dup. Unfortunately, the paper was neither accepted for the Haskell Symposium nor the Haskell Implementors Workshop this year.

To my knowledge, there is no real-world-ready solution to the problem; only fragile work-arounds as the unit parameter trick that might break due to one or the other compiler optimizations.




回答2:


Interesting question. I don't know how to implement copyThunk. But there is something else you can do (sorry if you already knew this):

xsFunction :: () -> [Int]
xsFunction = const [1..10000000]

better = do
  print $ foldl1' (+) $ xsFunction ()
  print $ length $ xsFunction ()

Here it definitely won't put the expression xsFunction () in a thunk, it will be calculated twice thus not making any memory bloat.


An interesting follow up on this is:

  • Can one ever implement copyThunk?
  • Should a haskell programmer ever be messing around with this relatively low level optimizations? Can't we assume ghc to outsmart us on this?



回答3:


Turn xs into a function. This may be ugly, but works, because it prevents sharing:

let xs () = [1..1000000]

good = do
    print $ foldl1' (+) (xs ())
    print $ length (xs ())


来源:https://stackoverflow.com/questions/11675807/can-a-thunk-be-duplicated-to-improve-memory-performance

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!