Spark SQL fails because “Constant pool has grown past JVM limit of 0xFFFF”

故事扮演 提交于 2019-12-03 14:40:45
Andrew

This is due to known limitation of Java for generated classes to go beyond 64Kb.

This limitation has been worked around in SPARK-18016 which is fixed in Spark 2.3 - will be released in Jan/2018.

Nhan Trinh

Solved this problem by dropping all the unused column in the Dataframe, or just filter columns you actually need.

Turnes out Spark Dataframe can not handle super wide schemas. There is no specific number of columns where Spark might break with “Constant pool has grown past JVM limit of 0xFFFF” - it depends on kind of query, but reducing number of columns can help to workaround this issue.

The underlying root cause is in JVM's 64kb for generated Java classes - see also Andrew's answer.

For future reference, this issue was fixed in spark 2.3 (As Andrew noted).

If you encounter this issue on Amazon EMR, upgrade to release version 5.13 or above.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!