发表新帖

发表新帖

Spark job restarted after showing all jobs completed and then fails (TimeoutException: Futures timed out after [300 seconds])

后端未结

关注

 1  1783

长发绾君心

I\'m running a spark job. It shows that all of the jobs were completed:

however after couple of minutes the entire job restarts, this time it will show all jobs and

相关标签:

1条回答

猫巷女王i

2021-01-03 03:07

What solved this eventually was persisting both data frames before join.

I looked at the execution plan before and after persisting the data frames, and the strange thing was that before persisting spark tried to perform a BroadcastHashJoin, which clearly failed due to large size of the data frame, and after persisting the execution plan showed that the join will be ShuffleHashJoin, that completed without any issues whatsoever. A bug? Maybe, I'll try with a newer spark version when I'll get to it.

0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题