AWS Glue - can't set spark.yarn.executor.memoryOverhead

陌路散爱 提交于 2019-12-05 17:49:43

Unfortunately the current version of the Glue doesn't support this functionality. You cannot set other parameters than using UI. In your case, instead of using AWS Glue, you can use AWS EMR service.

When I had the similar problem I tried to reduce the number of shuffles and the amount of data shuffled, and increase DPU. During the work on this problem I based on the following articles. I hope they will be useful.

http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-1/

https://www.indix.com/blog/engineering/lessons-from-using-spark-to-process-large-amounts-of-data-part-i/

https://umbertogriffo.gitbooks.io/apache-spark-best-practices-and-tuning/content/sparksqlshufflepartitions_draft.html


Updated: 2019-01-13

Amazon added lately new section to AWS Glue documentation which describes how to monitor and optimize Glue jobs. I think it is very useful thing to understand where is the problem related to memory issue and how to avoid it.

https://docs.aws.amazon.com/glue/latest/dg/monitor-profile-glue-job-cloudwatch-metrics.html

  • Open Glue > Jobs > Edit your Job > Script libraries and job parameters (optional) > Job parameters near the bottom

  • Set the following > key: --conf value: spark.yarn.executor.memoryOverhead=1024

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!