问题
Imagine the following architecture. There is an actor in akka that receives push messages via websocket. They have a timestamp and interval between those timestamps is 1 minute. Though the messages with the same timestamp can arrive multiple times via websocket. And then this messages are being broadcasted to as example three further actors (ma). They calculate metrics and push the messages further to the one actor(c).
For ma I defined a TimeSeriesBuffer that allows writing to the buffer only if entities have consequent timestamps. After successfull push to the buffer ma's emit metrics, that go to the c. c can only change it's state when it has all three metrics. Therefore I defined a trait Synchronizable and then a SynchronizableTimeSeriesBuffer with "master-slave" architecture.
On each push to every buffer a check is triggered in order to understand if there are new elements in the buffers of all three SynchronizableTimeSeriesBuffer with the same timestamp that can be emitted further to c as a single message.
So here are the questions:
1) Is it too complicated of a solution?
2) Is there a better way to do it in terms of scala and akka?
3) Why is it not so fast and not so parallel when messages in the system instead of being received "one by one" are loaded from db in a big batch and fed to the system in order to backtest the metrics. (one of the buffers is filling much faster than the others, while other one is at 0 length). I have an assumption it has something to do with akka's settings regarding dispatching/mailbox.
I created a gist with regarding code: https://gist.github.com/ifif14/18b5f85cd638af7023462227cd595a2f
I would much appreciate the community's help in solving this nontrivial case.
Thanks in advance
Igor
回答1:
Simplification
It seems like much of your architecture is designed to ensure that your message are sequentially ordered in time. Why not just add a simple Actor
at the beginning that filters out duplicated messages? Then the rest of your system could be relatively simple.
As an example; given a message with timestamp
type Payload = ???
case class Message(timestamp : Long, payload : Payload)
You can write the filter Actor:
class FilterActor(ma : Iterable[ActorRef]) extends Actor {
var currentMaxTime = 0L
override def receive = {
case m : Message if m.timestamp > currentMaxTime => ma foreach (_ ! m)
case _ =>
}
}
Now you can eliminate all of the "TimeSeriesBuffer" and "Synchronizable" logic since you know that ma, and c, will only receive time-ordered messages.
Batch Processing
The likely reason why batch processing is not so concurrent is because the mailbox for your ma
Actor is being filled up by the database query and whatever processing it is doing is slower than the processing for c
. Therefore ma's mailbox continues to accumulate messages while c's mailbox remains relatively empty.
回答2:
Thanks so much for your answer. The part with cutting off is what I also implemented in Synchronizable Trait.
//clean up slaves. if their queue is behind masters latest element
master_last_timestamp match {
case Some(ts) => {
slaves.foreach { s =>
while ( s.queue.length > 0 && s.getElementTimestamp(s.queue.front) < ts ) {
s.dequeue()
}
// val els = s.dequeueAll { queue_el => s.getElementTimestamp(queue_el) < ts }
}
}
case _ => Unit
}
The reason why I started to implement the buffer is because I feel like I will be using it a lot in the system and I don't think to write this part for each actor I will be using. Seems easier to have a blueprint that does it.
But a more important reason is that for some reason one buffer is either being filled much slower or not at all than the other two. Though they are being filled by the same actors!! (just different instances, and computation time should be pretty much the same) And then after two other actors emitted all messages that were "passed" from the database the third one starts receiving it. It feels to me that this one actor is just not getting processor time. So I think it's a dispatcher's setting that can affect this. Are you familiar with this?
Also I would expect dispatcher work more like round-robin, given each process a little of execution time, but it ends up serving only limited amount of actors and then jumping to the next ones. Although they sort of have to receive initial messages at the same time since there is a broadcaster.
I read akka documentation on dispatchers and mailboxes, but I still don't understand how to do it.
Thank you
Igor
来源:https://stackoverflow.com/questions/47198177/akka-synchronizing-timestamped-messages-from-several-actors