Boosting spark.yarn.executor.memoryOverhead

£可爱£侵袭症+ 提交于 2019-12-03 06:10:38

After a couple of hours I found the solution to this problem. When creating the cluster, I needed to pass on the following flag as a parameter:

--configurations file://./sparkConfig.json\

With the JSON file containing:

[
    {
      "Classification": "spark-defaults",
      "Properties": {
        "spark.executor.memory": "10G"
      }
    }
  ]

This allows me to increase the memoryOverhead in the next step by using the parameter I initially posted.

If you are logged into an EMR node and want to further alter Spark's default settings without dealing with the AWSCLI tools you can add a line to the spark-defaults.conf file. Spark is located in EMR's /etc directory. Users can access the file directly by navigating to or editing /etc/spark/conf/spark-defaults.conf

So in this case we'd append spark.yarn.executor.memoryOverhead to the end of the spark-defaults file. The end of the file looks very similar to this example:

spark.driver.memory              1024M
spark.executor.memory            4305M
spark.default.parallelism        8
spark.logConf                    true
spark.executorEnv.PYTHONPATH     /usr/lib/spark/python
spark.driver.maxResultSize       0
spark.worker.timeout             600
spark.storage.blockManagerSlaveTimeoutMs 600000
spark.executorEnv.PYTHONHASHSEED 0
spark.akka.timeout               600
spark.sql.shuffle.partitions     300
spark.yarn.executor.memoryOverhead 1000M

Similarly, the heap size can be controlled with the --executor-memory=xg flag or the spark.executor.memory property.

Hope this helps...

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!