问题
I have actors that need to do very long-running and computationally expensive work, but the computation itself can be done incrementally. So while the complete computation itself takes hours to complete, the intermediate results are actually extremely useful, and I'd like to be able to respond to any requests of them. This is the pseudo code of what I want to do:
var intermediateResult = ...
loop {
while (mailbox.isEmpty && computationNotFinished)
intermediateResult = computationStep(intermediateResult)
receive {
case GetCurrentResult => sender ! intermediateResult
...other messages...
}
}
回答1:
I assume from your comment to Roland Kuhn answer that you have some work which can be considered as recursive, at least in blocks. If this is not the case, I don't think there could be any clean solution to handle your problem and you will have to deal with complicated pattern matching blocks.
If my assumptions are correct, I would schedule the computation asynchronously and let the actor be free to answer other messages. The key point is to use Future monadic capabilities and having a simple receive block. You would have to handle three messages (startComputation, changeState, getState)
You will end up having the following receive:
def receive {
case StartComputation(myData) =>expensiveStuff(myData)
case ChangeState(newstate) = this.state = newstate
case GetState => sender ! this.state
}
And then you can leverage the map method on Future, by defining your own recursive map:
def mapRecursive[A](f:Future[A], handler: A => A, exitConditions: A => Boolean):Future[A] = {
f.flatMap { a=>
if (exitConditions(a))
f
else {
val newFuture = f.flatMap{ a=> Future(handler(a))}
mapRecursive(newFuture,handler,exitConditions)
}
}
}
Once you have this tool, everything is easier. If you look to the following example :
def main(args:Array[String]){
val baseFuture:Future[Int] = Promise.successful(64)
val newFuture:Future[Int] = mapRecursive(baseFuture,
(a:Int) => {
val result = a/2
println("Additional step done: the current a is " + result)
result
}, (a:Int) => (a<=1))
val one = Await.result(newFuture,Duration.Inf)
println("Computation finished, result = " + one)
}
Its output is:
Additional step done: the current a is 32
Additional step done: the current a is 16
Additional step done: the current a is 8
Additional step done: the current a is 4
Additional step done: the current a is 2
Additional step done: the current a is 1
Computation finished, result = 1
You understand you can do the same, inside your expensiveStuff
method
def expensiveStuff(myData:MyData):Future[MyData]= {
val firstResult = Promise.successful(myData)
val handler : MyData => MyData = (myData) => {
val result = myData.copy(myData.value/2)
self ! ChangeState(result)
result
}
val exitCondition : MyData => Boolean = (myData:MyData) => myData.value==1
mapRecursive(firstResult,handler,exitCondition)
}
EDIT - MORE DETAILED
If you don't want to block the Actor, which processes messages from its mailbox in a thread-safe and synchronous manner, the only thing you can do is to get the computation executed on a different thread. This is exactly an high performance non blocking receive.
However, you were right in saying that the approach I propose pays a high performance penalty. Every step is done on a different future, which might be not necessary at all. You can therefore recurse the handler to obtain a single-threaded or multiple-threaded execution. There is no magic formula after all:
- If you want to schedule asynchronously and minimize the cost, all the work should be done by a single thread
- This however could prevent other work to start, because if all the threads on a thread pool are taken, the futures will queue. You might therefore want to break the operation into multiple futures so that even at full usage some new work can be scheduled before old work has been completed.
def recurseFuture[A](entryFuture: Future[A], handler: A => A, exitCondition: A => Boolean, maxNestedRecursion: Long = Long.MaxValue): Future[A] = {
def recurse(a:A, handler: A => A, exitCondition: A => Boolean, maxNestedRecursion: Long, currentStep: Long): Future[A] = {
if (exitCondition(a))
Promise.successful(a)
else
if (currentStep==maxNestedRecursion)
Promise.successful(handler(a)).flatMap(a => recurse(a,handler,exitCondition,maxNestedRecursion,0))
else{
recurse(handler(a),handler,exitCondition,maxNestedRecursion,currentStep+1)
}
}
entryFuture.flatMap { a => recurse(a,handler,exitCondition,maxNestedRecursion,0) }
}
I have enhanced for testing purposes my handler method:
val handler: Int => Int = (a: Int) => {
val result = a / 2
println("Additional step done: the current a is " + result + " on thread " + Thread.currentThread().getName)
result
}
Approach 1: Recurse the handler on itself so to get all execute on a single thread.
println("Starting strategy with all the steps on the same thread")
val deepestRecursion: Future[Int] = recurseFuture(baseFuture,handler, exitCondition)
Await.result(deepestRecursion, Duration.Inf)
println("Completed strategy with all the steps on the same thread")
println("")
Approach 2: Recurse for a limited depth the handler on itself
println("Starting strategy with the steps grouped by three")
val threeStepsInSameFuture: Future[Int] = recurseFuture(baseFuture,handler, exitCondition,3)
val threeStepsInSameFuture2: Future[Int] = recurseFuture(baseFuture,handler, exitCondition,4)
Await.result(threeStepsInSameFuture, Duration.Inf)
Await.result(threeStepsInSameFuture2, Duration.Inf)
println("Completed strategy with all the steps grouped by three")
executorService.shutdown()
回答2:
The best way to do this is very close to what you are doing already:
case class Continue(todo: ToDo)
class Worker extends Actor {
var state: IntermediateState = _
def receive = {
case Work(x) =>
val (next, todo) = calc(state, x)
state = next
self ! Continue(todo)
case Continue(todo) if todo.isEmpty => // done
case Continue(todo) =>
val (next, rest) = calc(state, todo)
state = next
self ! Continue(rest)
}
def calc(state: IntermediateState, todo: ToDo): (IntermediateState, ToDo)
}
EDIT: more background
When an actor sends messages to itself, Akka’s internal processing will basically run those within a while
loop; the number of messages processed in one go is determined by the actor’s dispatcher’s throughput
setting (defaults to 5), after this amount of processing the thread will be returned to the pool and the continuation be enqueued to the dispatcher as a new task. Hence there are two tunables in the above solution:
- process multiple steps for a single message (if processing steps are REALLY small)
- increase
throughput
setting for increased throughput and decreased fairness
The original problem seems to have hundreds of such actors running, presumably on common hardware which does not have hundreds of CPUs, so the throughput setting should probably be set such that each batch takes no longer than ca. 10ms.
Performance Assessment
Let’s play a bit with Fibonacci:
Welcome to Scala version 2.10.0-RC1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_07).
Type in expressions to have them evaluated.
Type :help for more information.
scala> def fib(x1: BigInt, x2: BigInt, steps: Int): (BigInt, BigInt) = if(steps>0) fib(x2, x1+x2, steps-1) else (x1, x2)
fib: (x1: BigInt, x2: BigInt, steps: Int)(BigInt, BigInt)
scala> def time(code: =>Unit) { val start = System.currentTimeMillis; code; println("code took " + (System.currentTimeMillis - start) + "ms") }
time: (code: => Unit)Unit
scala> time(fib(1, 1, 1000))
code took 1ms
scala> time(fib(1, 1, 1000))
code took 1ms
scala> time(fib(1, 1, 10000))
code took 5ms
scala> time(fib(1, 1, 100000))
code took 455ms
scala> time(fib(1, 1, 1000000))
code took 17172ms
Which means that in a presumably quite optimized loop, fib_100000 takes half a second. Now let’s play a bit with actors:
scala> case class Cont(steps: Int, batch: Int)
defined class Cont
scala> val me = inbox()
me: akka.actor.ActorDSL.Inbox = akka.actor.dsl.Inbox$Inbox@32c0fe13
scala> val a = actor(new Act {
var s: (BigInt, BigInt) = _
become {
case Cont(x, y) if y < 0 => s = (1, 1); self ! Cont(x, -y)
case Cont(x, y) if x > 0 => s = fib(s._1, s._2, y); self ! Cont(x - 1, y)
case _: Cont => me.receiver ! s
}
})
a: akka.actor.ActorRef = Actor[akka://repl/user/$c]
scala> time{a ! Cont(1000, -1); me.receive(10 seconds)}
code took 4ms
scala> time{a ! Cont(10000, -1); me.receive(10 seconds)}
code took 27ms
scala> time{a ! Cont(100000, -1); me.receive(10 seconds)}
code took 632ms
scala> time{a ! Cont(1000000, -1); me.receive(30 seconds)}
code took 17936ms
This is already interesting: given long enough time per step (with the huge BigInts behind the scenes in the last line), actors don’t much extra. Now let’s see what happens when doing smaller calculations in a more batched way:
scala> time{a ! Cont(10000, -10); me.receive(30 seconds)}
code took 462ms
This is pretty close to the result for the direct variant above.
Conclusion
Sending messages to self is NOT expensive for almost all applications, just keep the actual processing step slightly larger than a few hundred nanoseconds.
回答3:
You should not use Actors to make long running computations as these will block the threads that are supposed to run the Actors code.
I would try to go with a design that uses a separate Thread/ThreadPool for the computations and use AtomicReferences to store/query the intermediate results in the lines of the following pseudo code:
val cancelled = new AtomicBoolean(false)
val intermediateResult = new AtomicReference[IntermediateResult]()
object WorkerThread extends Thread {
override def run {
while(!cancelled.get) {
intermediateResult.set(computationStep(intermediateResult.get))
}
}
}
loop {
react {
case StartComputation => WorkerThread.start()
case CancelComputation => cancelled.set(true)
case GetCurrentResult => sender ! intermediateResult.get
}
}
回答4:
This is a classic concurrency problem. You want want several routines/actors (or whatever you want to call them). Code is mostly correct Go, with obscenely long variable names for context. The first routine handles queries and intermediate results:
func serveIntermediateResults(
computationChannel chan *IntermediateResult,
queryChannel chan chan<-*IntermediateResult) {
var latestIntermediateResult *IntermediateResult // initial result
for {
select {
// an update arrives
case latestIntermediateResult, notClosed := <-computationChannel:
if !notClosed {
// the computation has finished, stop checking
computationChannel = nil
}
// a query arrived
case queryResponseChannel, notClosed := <-queryChannel:
if !notClosed {
// no more queries, so we're done
return
}
// respond with the latest result
queryResponseChannel<-latestIntermediateResult
}
}
}
In your long computation, you update your intermediate result wherever appropriate:
func longComputation(intermediateResultChannel chan *IntermediateResult) {
for notFinished {
// lots of stuff
intermediateResultChannel<-currentResult
}
close(intermediateResultChannel)
}
Finally to ask for the current result, you have a wrapper to make this nice:
func getCurrentResult() *IntermediateResult {
responseChannel := make(chan *IntermediateResult)
// queryChannel was given to the intermediate result server routine earlier
queryChannel<-responseChannel
return <-responseChannel
}
来源:https://stackoverflow.com/questions/12851996/incremental-processing-in-an-akka-actor