问题
I'm trying to interface a large Scala + Akka + PlayMini application with an external REST API. The idea is to periodically poll (basically every 1 to 10 minutes) a root URL and then crawl through sub-level URLs to extract data which is then sent to a message queue.
I have come up with two ways to do this:
1st way
Create a hierarchy of actors to match the resource path structure of the API. In the Google Latitude case, that would mean, e.g.
- Actor 'latitude/v1/currentLocation' polls https://www.googleapis.com/latitude/v1/currentLocation
- Actor 'latitude/v1/location' polls https://www.googleapis.com/latitude/v1/location
- Actor 'latitude/v1/location/1' polls https://www.googleapis.com/latitude/v1/location/1
- Actor 'latitude/v1/location/2' polls https://www.googleapis.com/latitude/v1/location/2
- Actor 'latitude/v1/location/3' polls https://www.googleapis.com/latitude/v1/location/3
- etc.
In this case, each actor is responsible for polling its associated resource periodically, as well as creating / deleting child actors for next-level path resources (i.e. actor 'latitude/v1/location' creates actors 1, 2, 3, etc. for all locations it learns about through polling of https://www.googleapis.com/latitude/v1/location).
2nd way
Create a pool of identical polling actors which receive polling requests (containing the resource path) load-balanced by a router, poll the URL once, do some processing, and schedule polling requests (both for next-level resources and for the polled URL). In Google Latitude, that would mean for instance:
1 router, n poller actors. Initial polling request for https://www.googleapis.com/latitude/v1/location leads to several new (immediate) polling requests for https://www.googleapis.com/latitude/v1/location/1, https://www.googleapis.com/latitude/v1/location/2, etc. and one (delayed) polling request for the same resource, i.e. https://www.googleapis.com/latitude/v1/location.
I have implemented both solutions and can't immediately observe any relevant difference of performance, at least not for the API and polling frequencies I am interested in. I find the first approach to be somewhat easier to reason about and perhaps easier to use with system.scheduler.schedule(...) than the second approach (where I need to scheduleOnce(...)). Also, assuming resources are nested through several levels and somewhat short-lived (e.g. several resources may be added/removed between each polling), akka's lifecycle management makes it easy to kill off a whole branch in the 1st case. The second approach should (theoretically) be faster and the code is somewhat easier to write.
My questions are:
- What approach seems to be the best (in terms of performance, extensibility, code complexity, etc.)?
- Do you see anything wrong with the design of either approach (esp. the 1st one)?
- Has anyone tried to implement anything similar? How was it done?
Thanks!
回答1:
Why not create a master poller, which then kicks of async resource requests on the schedule?
I'm no expert using Akka, but I gave this a shot:
The poller object that iterates through the list of resources to fetch:
import akka.util.duration._
import akka.actor._
import play.api.Play.current
import play.api.libs.concurrent.Akka
object Poller {
  val poller = Akka.system.actorOf(Props(new Actor {
    def receive = {
      case x: String => Akka.system.actorOf(Props[ActingSpider], name=x.filter(_.isLetterOrDigit)) ! x
    }
  }))
  def start(l: List[String]): List[Cancellable] =
    l.map(Akka.system.scheduler.schedule(3 seconds, 3 seconds, poller, _))
  def stop(c: Cancellable) {c.cancel()}
}
The actor that reads the resource asynchronously and triggers more async reads. You could put the message dispatch on a schedule rather than call immediately if it was kinder:
import akka.actor.{Props, Actor}
import java.io.File
class ActingSpider extends Actor {
  import context._
  def receive = {
    case name: String => {
      println("reading " + name)
      new File(name) match {
        case f if f.exists() => spider(f)
        case _ => println("File not found")
      }
      context.stop(self)
    }
  }
  def spider(file: File) {
    io.Source.fromFile(file).getLines().foreach(l => {
      val k = actorOf(Props[ActingSpider], name=l.filter(_.isLetterOrDigit))
      k ! l
    })
  }
}
来源:https://stackoverflow.com/questions/10654631/akka-for-rest-polling