Why Pyspark jobs are dying out in the middle of process without any particular error

前端 未结 2 1237
清酒与你
清酒与你 2020-12-20 07:26

Experts, I am noticing one peculiar thing with one of the Pyspark jobs in production(running in YARN cluster mode). After executing for around an hour + (around 65-75 mins),

2条回答
  •  鱼传尺愫
    2020-12-20 08:00

    Are you breaking the lineage? If not then the issue might be with lineage. Can you try breaking the lineage in between the code somewhere and try it.

    #Spark 1.6 code
    sc.setCheckpointDit('.')
    #df is the original dataframe name you are performing transformations on
    dfrdd = df.rdd
    dfrdd.checkpoint()
    df=sqlContext.createDataFrame(dfrdd)
    print df.count()
    

    Let me know if it helps.

提交回复
热议问题