How to execute async operations (i.e. returning a Future) from map/filter/etc.?

十年热恋 提交于 2021-02-07 20:20:23

问题


I have a DataSet.map operation that needs to pull data in from an external REST API.

The REST API client returns a Future[Int].

Is it possible to have the DataSet.map operation somehow await this Future asynchronously? Or will I need to block the thread using Await.result? Or is this just not the done thing... i.e. should I instead try and load the data held by the API into a DataSet of its own, and perform a join?

Thanks in advance!

EDIT:

Different from: Spark job with Async HTTP call

Reason: This question is open to discussing how to solve the problem differently, say, using a second DataSet and a join instead. Furthermore, the linked question contains no definitive answer as to whether Spark can handle asynchronous transformations - and if it can - how they should be structured.


回答1:


It's an interesting question (that I don't think is a duplicate of the other question either).

Yes, you can submit Spark jobs which is to say that the Spark jobs are going to be executed asynchronously (leaving the main calling thread free to do whatever it wants after the call). This is SparkContext.submitJob.

Yes, you can run Spark jobs simultaneously from multiple threads using the very same SparkContext, i.e. SparkContext is thread-safe.

Given the two options, you can have a thread pool (using java.util.concurrent.Executors) and execute Spark jobs that in turn execute an asynchronous action, say "pull data in from an external REST API that returns a Future[Int]."

Now, this part has nothing to do with Spark. How you want to get notified about the result of a Future[Int] is up to you. You can Await or just register a callback to get called when a Success or a Failure happen. It's up to you and have nothing to do with Spark.

What does matter is how you're going to submit or run a Spark job since map alone won't do this. map is a transformation. I'd rather use foreachPartition instead that would do the external call.



来源:https://stackoverflow.com/questions/40932272/how-to-execute-async-operations-i-e-returning-a-future-from-map-filter-etc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!