Why Pyspark jobs are dying out in the middle of process without any particular error

前端未结

关注

 2  1237

清酒与你 2020-12-20 07:26

Experts, I am noticing one peculiar thing with one of the Pyspark jobs in production(running in YARN cluster mode). After executing for around an hour + (around 65-75 mins),

2条回答

鱼传尺愫 (楼主)

2020-12-20 08:00
Are you breaking the lineage? If not then the issue might be with lineage. Can you try breaking the lineage in between the code somewhere and try it.
```
#Spark 1.6 code
sc.setCheckpointDit('.')
#df is the original dataframe name you are performing transformations on
dfrdd = df.rdd
dfrdd.checkpoint()
df=sqlContext.createDataFrame(dfrdd)
print df.count()
```
Let me know if it helps.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...