问题
I have a DataSet.map operation that needs to pull data in from an external REST API.
The REST API client returns a Future[Int].
Is it possible to have the DataSet.map operation somehow await this Future asynchronously? Or will I need to block the thread using Await.result? Or is this just not the done thing... i.e. should I instead try and load the data held by the API into a DataSet of its own, and perform a join?
Thanks in advance!
EDIT:
Different from: Spark job with Async HTTP call
Reason: This question is open to discussing how to solve the problem differently, say, using a second DataSet and a join instead. Furthermore, the linked question contains no definitive answer as to whether Spark can handle asynchronous transformations - and if it can - how they should be structured.
回答1:
It's an interesting question (that I don't think is a duplicate of the other question either).
Yes, you can submit Spark jobs which is to say that the Spark jobs are going to be executed asynchronously (leaving the main calling thread free to do whatever it wants after the call). This is SparkContext.submitJob.
Yes, you can run Spark jobs simultaneously from multiple threads using the very same SparkContext, i.e. SparkContext is thread-safe.
Given the two options, you can have a thread pool (using java.util.concurrent.Executors) and execute Spark jobs that in turn execute an asynchronous action, say "pull data in from an external REST API that returns a Future[Int]."
Now, this part has nothing to do with Spark. How you want to get notified about the result of a Future[Int] is up to you. You can Await or just register a callback to get called when a Success or a Failure happen. It's up to you and have nothing to do with Spark.
What does matter is how you're going to submit or run a Spark job since map alone won't do this. map is a transformation. I'd rather use foreachPartition instead that would do the external call.
来源:https://stackoverflow.com/questions/40932272/how-to-execute-async-operations-i-e-returning-a-future-from-map-filter-etc