问题
I have a DataSet.map
operation that needs to pull data in from an external REST API.
The REST API client returns a Future[Int]
.
Is it possible to have the DataSet.map
operation somehow await this Future
asynchronously? Or will I need to block the thread using Await.result
? Or is this just not the done thing... i.e. should I instead try and load the data held by the API into a DataSet
of its own, and perform a join
?
Thanks in advance!
EDIT:
Different from: Spark job with Async HTTP call
Reason: This question is open to discussing how to solve the problem differently, say, using a second DataSet
and a join
instead. Furthermore, the linked question contains no definitive answer as to whether Spark can handle asynchronous transformations - and if it can - how they should be structured.
回答1:
It's an interesting question (that I don't think is a duplicate of the other question either).
Yes, you can submit Spark jobs which is to say that the Spark jobs are going to be executed asynchronously (leaving the main calling thread free to do whatever it wants after the call). This is SparkContext.submitJob.
Yes, you can run Spark jobs simultaneously from multiple threads using the very same SparkContext
, i.e. SparkContext
is thread-safe.
Given the two options, you can have a thread pool (using java.util.concurrent.Executors) and execute Spark jobs that in turn execute an asynchronous action, say "pull data in from an external REST API that returns a Future[Int]."
Now, this part has nothing to do with Spark. How you want to get notified about the result of a Future[Int]
is up to you. You can Await
or just register a callback to get called when a Success
or a Failure
happen. It's up to you and have nothing to do with Spark.
What does matter is how you're going to submit or run a Spark job since map
alone won't do this. map
is a transformation. I'd rather use foreachPartition
instead that would do the external call.
来源:https://stackoverflow.com/questions/40932272/how-to-execute-async-operations-i-e-returning-a-future-from-map-filter-etc