Monadic fold with State monad in constant space (heap and stack)?

后端 未结 2 1549
小鲜肉
小鲜肉 2021-02-20 04:35

Is it possible to perform a fold in the State monad in constant stack and heap space? Or is a different functional technique a better fit to my problem?

The next section

2条回答
  •  旧时难觅i
    2021-02-20 05:13

    Our real issue is the heap used by the unexecuted State mobits.

    No, it is not. The real issue is that the collection doesn't fit in memory and that foldLeftM and foldRightM force the entire collection. A side effect of the impure solution is that you are freeing memory as you go. In the "purely functional" solution, you're not doing that anywhere.

    Your use of Iterable ignores a crucial detail: what kind of collection col actually is, how its elements are created and how they are expected to be discarded. And so, necessarily, does foldLeftM on Iterable. It is likely too strict, and you are forcing the entire collection into memory. For example, if it is a Stream, then as long as you are holding on to col all the elements forced so far will be in memory. If it's some other kind of lazy Iterable that doesn't memoize its elements, then the fold is still too strict.

    I tried your first example with an EphemeralStream did not see any significant heap pressure, even though it will clearly have the same "unexecuted State mobits". The difference is that an EphemeralStream's elements are weakly referenced and its foldRight doesn't force the entire stream.

    I suspect that if you used Foldable.foldr, then you would not see the problematic behaviour since it folds with a function that is lazy in its second argument. When you call the fold, you want it to return a suspension that looks something like this immediately:

    Suspend(() => head |+| tail.foldRightM(...))
    

    When the trampoline resumes the first suspension and runs up to the next suspension, all of the allocations between suspensions will become available to be freed by the garbage collector.

    Try the following:

    def foldM[M[_]:Monad,A,B](a: A, bs: Iterable[B])(f: (A, B) => M[A]): M[A] =
      if (bs.isEmpty) Monad[M].point(a)
      else Monad[M].bind(f(a, bs.head))(fax => foldM(fax, bs.tail)(f))
    
    val MS = StateT.stateTMonadState[Int, Trampoline]
    import MS._
    
    foldM[M,R,Int](Monoid[R].zero, col) {
      (x, r) => modify(_ + 1) map (_ => Monoid[R].append(x, r))
    } run 0 run
    

    This will run in constant heap for a trampolined monad M, but will overflow the stack for a non-trampolined monad.

    But the real problem is that Iterable is not a good abstraction for data that are too large to fit in memory. Sure, you can write an imperative side-effecty program where you explicitly discard elements after each iteration or use a lazy right fold. That works well until you want to compose that program with another one. And I'm assuming that the whole reason you're investigating doing this in a State monad to begin with is to gain compositionality.

    So what can you do? Here are some options:

    1. Make use of Reducer, Monoid, and composition thereof, then run in an imperative explicitly-freeing loop (or a trampolined lazy right fold) as the last step, after which composition is not possible or expected.
    2. Use Iteratee composition and monadic Enumerators to feed them.
    3. Write compositional stream transducers with Scalaz-Stream.

    The last of these options is the one that I would use and recommend in the general case.

提交回复
热议问题