发表新帖

发表新帖

Spark java.lang.OutOfMemoryError: Java heap space

后端未结

关注

 12  2131

半阙折子戏 2020-11-22 13:55

My cluster: 1 master, 11 slaves, each node has 6 GB memory.

My settings:

spark.executor.memory=4g, Dspark.akka.frameSize=512

12条回答

执念已碎 (楼主)

2020-11-22 14:16
From my understanding of the code provided above, it loads the file and does map operation and saves it back. There is no operation that requires shuffle. Also, there is no operation that requires data to be brought to the driver hence tuning anything related to shuffle or driver may have no impact. The driver does have issues when there are too many tasks but this was only till spark 2.0.2 version. There can be two things which are going wrong.
- There are only one or a few executors. Increase the number of executors so that they can be allocated to different slaves. If you are using yarn need to change num-executors config or if you are using spark standalone then need to tune num cores per executor and spark max cores conf. In standalone num executors = max cores / cores per executor .
- The number of partitions are very few or maybe only one. So if this is low even if we have multi-cores,multi executors it will not be of much help as parallelization is dependent on the number of partitions. So increase the partitions by doing imageBundleRDD.repartition(11)
0 讨论(0)

查看其它12个回答
发布评论:

提交评论
- 加载中...

热议问题