Amazon Elastic MapReduce Bootstrap Actions not working

拟墨画扇 提交于 2019-12-03 17:31:49

You have two options to achieve this:

Custom JVM Settings

In order to apply custom settings, You might want to have a look at the Bootstrap Actions documentation for Amazon Elastic MapReduce (Amazon EMR), specifically action Configure Daemons:

This predefined bootstrap action lets you specify the heap size or other Java Virtual Machine (JVM) options for the Hadoop daemons. You can use this bootstrap action to configure Hadoop for large jobs that require more memory than Hadoop allocates by default. You can also use this bootstrap action to modify advanced JVM options, such as garbage collection behavior.

An example is provided as well, which sets the heap size to 2048 and configures the Java namenode option:

$ ./elastic-mapreduce –create –alive \
  --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-daemons \
  --args --namenode-heap-size=2048,--namenode-opts=-XX:GCTimeRatio=19   

Predefined JVM Settings

Alternatively, as per the FAQ How do I configure Hadoop settings for my job flow?, if your job flow tasks are memory-intensive, you may choose to use fewer tasks per core and reduce your job tracker heap size. For this situation, a pre-defined Bootstrap Action is available to configure your job flow on startup - this refers to action Configure Memory-Intensive Workloads, which allows you to set cluster-wide Hadoop settings to values appropriate for job flows with memory-intensive workloads, for example:

$ ./elastic-mapreduce --create \
--bootstrap-action \
  s3://elasticmapreduce/bootstrap-actions/configurations/latest/memory-intensive

The specific configuration settings applied by this predefined bootstrap action are listed in Hadoop Memory-Intensive Configuration Settings.

Good luck!

Steffen's answer is good and works. On the other hand if you just want something quick-and-dirty and just want to replace one or two variables, then you're probably looking to just change it via the command line like the following:

elastic-mapreduce --create \
--bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop \
  --args "-m,mapred.child.java.opts=-Xmx999m"

I've seen another documentation, albeit an older one, that simply quotes the entire expression within one quote like the following:

--bootstrap-action "s3://elasticmapreduce/bootstrap-actions/configure-hadoop -m \
    mapred.child.java.opts=-Xmx999m"    ### I tried this style, it no longer works!

At any rate, this is not easily found in the AWS EMR documentation. I suspect that mapred.child.java.opts is one of the most overridden variables-- I was also looking for an answer when I got a GC error: "java.lang.OutOfMemoryError: GC overhead limit exceeded" and stumbled on this page. The default of 200m is just too small (documentation on defaults).

Good luck!

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!