问题
I want to write a flow on akka-stream
for grouping events from infinite stream by session_uid and calculate sum of traffic for each session (details in my previous question).
I am going to use Source#groupBy
function for group events by session_uid but seems like this function accumulate all group keys inside and don't have a way to release them. This is caused java.lang.OutOfMemoryError: Java heap space
exception. Here is code for reproduce it:
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{Flow, Sink, Source}
import scala.util.Random
object GroupByMemoryLeakApplication extends App {
implicit val system = ActorSystem()
import system.dispatcher
implicit val materializer = ActorMaterializer()
val bigString = Random.nextString(512 * 1024)
// This is infinite stream of events (i.e. this is session ids)
val eventsSource = Source(() => (1 to 1000000000).iterator)
.map((i) => { (i, bigString + i) })
// This is flow pass event through groupBy function
val groupByFlow = Flow[(Int, String)]
.groupBy(_._2)
.map {
case (sessionUid, sessionEvents) =>
sessionEvents
.map(e => { println(e._1); e })
.runWith(Sink.head)
}
.mapAsync(4)(identity)
eventsSource
.via(groupByFlow)
.runWith(Sink.ignore)
.onComplete(_ => system.shutdown())
}
So, how to release grouping key (sessionUid
) inside groupBy
after complete processing of related stream of events (sessionEvents
)?
May be anybody known an other way for grouping events by session_uid base on akka-stream
?
来源:https://stackoverflow.com/questions/33865423/is-groupby-leaking-in-akka-stream