Spark job with Async HTTP call

后端未结

关注

 4  442

遥遥无期 2020-12-05 16:10

I build a RDD from a list of urls, and then try to fetch datas with some async http call. I need all the results before doing other calculs. Ideally, I need to make the http

4条回答

囚心锁ツ (楼主)

2020-12-05 16:37

This wont work.

You cannot expect the request objects be distributed and responses collected over a cluster by other nodes. If you do then the spark calls for future will never end. The futures will never work in this case.

If your map() make sync(http) requests then please collect responses within the same action/transformation call and then subject the results(responses) to further map/reduce/other calls.

In your case, please rewrite logic collect the responses for each call in sync and remove the notion of futures then all should be fine.

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...