Spark when union a lot of RDD throws stack overflow error

Deadly 提交于 2019-11-28 00:44:33
Sean Owen

Use SparkContext.union(...) instead to union many RDDs at once.

You don't want to do it one at a time like that since RDD.union() creates a new step in the lineage (an extra set of stack frames on any computation) for each RDD, whereas SparkContext.union() makes it all at once. This will insure not getting a stack-overflow error.

It seems that when union RDD one by one can get into a series of very long recursive function calls. In this case we need to increase JVM stack memory. In spark with option --driver-java-options "-Xss 100M", driver jvm stack memory is configured to 100M.

Sean Owen's solution also solves the problem in more elegant way.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!