AWS Glue executor memory limit

元气小坏坏 提交于 2019-11-30 18:46:06

The official glue documentation suggests that glue doesn't support custom spark config.

There are also several argument names used by AWS Glue internally that you should never set:

--conf — Internal to AWS Glue. Do not set!

--debug — Internal to AWS Glue. Do not set!

--mode — Internal to AWS Glue. Do not set!

--JOB_NAME — Internal to AWS Glue. Do not set!

Any better suggestion on solving this problem?

You can override the parameters by editing the job and adding job parameters. The key and value I used are here:

Key: --conf

Value: spark.yarn.executor.memoryOverhead=7g

This seemed counterintuitive since the setting key is actually in the value, but it was recognized. So if you're attempting to set spark.yarn.executor.memory the following parameter would be appropriate:

Key: --conf

Value: spark.yarn.executor.memory=7g

  1. Open Glue> Jobs > Edit your Job> Script libraries and job parameters (optional) > Job parameters near the bottom
  2. Set the following: key: --conf value: spark.yarn.executor.memoryOverhead=1024 spark.driver.memory=10g

despite aws documentation stating that the --conf parameter should not be passed, our AWS support team told us to pass --conf spark.driver.memory=10g which corrected the issue we were having

I hit out of memory errors like this when I had a highly skewed dataset. In my case, I had a bucket of json files that contained dynamic payloads that were different based on the event type indicated in the json. I kept hitting Out of Memory errors no matter if I used the configuration flags indicated here and increased the DPUs. It turns out that my events were highly skewed to a couple of the event types being > 90% of the total data set. Once I added a "salt" to the event types and broke up the highly skewed data I did not hit any out of memory errors.

Here's a blog post for AWS EMR that talks about the same Out of Memory error with highly skewed data. https://medium.com/thron-tech/optimising-spark-rdd-pipelines-679b41362a8a

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!