Spark job with Async HTTP call

后端 未结 4 442
遥遥无期
遥遥无期 2020-12-05 16:10

I build a RDD from a list of urls, and then try to fetch datas with some async http call. I need all the results before doing other calculs. Ideally, I need to make the http

4条回答
  •  囚心锁ツ
    2020-12-05 16:37

    This wont work.

    You cannot expect the request objects be distributed and responses collected over a cluster by other nodes. If you do then the spark calls for future will never end. The futures will never work in this case.

    If your map() make sync(http) requests then please collect responses within the same action/transformation call and then subject the results(responses) to further map/reduce/other calls.

    In your case, please rewrite logic collect the responses for each call in sync and remove the notion of futures then all should be fine.

提交回复
热议问题