iterative code with long lineage RDD causes stackoverflow error in Apache Spark

大兔子大兔子 提交于 2019-12-04 15:53:09

Looking at RDD.checkpoint documentation, it says:

This function must be called before any job has been executed on this RDD

And indeed, if you change your code slightly, to have the checkpoint done before collecting a - it works with no StackOverflowError:

for(i <- 1 to 1000){
  a = a.map(x => x+1).persist

  if(i%100 == 0){
    a.checkpoint()
  }

  var b = a.collect()

  print(".")
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!