Spark job restarted after showing all jobs completed and then fails (TimeoutException: Futures timed out after [300 seconds])

后端 未结 1 1783
长发绾君心
长发绾君心 2021-01-03 02:39

I\'m running a spark job. It shows that all of the jobs were completed:

however after couple of minutes the entire job restarts, this time it will show all jobs and

相关标签:
1条回答
  • 2021-01-03 03:07

    What solved this eventually was persisting both data frames before join.

    I looked at the execution plan before and after persisting the data frames, and the strange thing was that before persisting spark tried to perform a BroadcastHashJoin, which clearly failed due to large size of the data frame, and after persisting the execution plan showed that the join will be ShuffleHashJoin, that completed without any issues whatsoever. A bug? Maybe, I'll try with a newer spark version when I'll get to it.

    0 讨论(0)
提交回复
热议问题