Hadoop Number of Reducers Configuration Options Priority

≯℡__Kan透↙ 提交于 2019-12-10 23:44:20

问题


What are the priorities of the following 3 options for setting number of reduces? In other words, if all three are set, which one will be taken into account?

Option1:

setNumReduceTasks(2) within the application code

Option2:

-D mapreduce.job.reduces=2 as command line argument

Option3:

through $HADOOP_CONF_DIR/mapred-site.xml file

 <property>
  <name>mapreduce.job.reduces</name>
  <value>2</value>
 </property>

回答1:


You have them racked in priority order - option 1 will override 2, and 2 will override 3. In other words Option 1 will be the one used by your job in this scenario




回答2:


According to the Hadoop - The Definitive Guide

The -D option is used to set the configuration property with key color to the value yellow. Options specified with -D take priority over properties from the configuration files. This is very useful because you can put defaults into configuration files and then override them with the -D option as needed. A common example of this is setting the number of reducers for a MapReduce job via -D mapred.reduce.tasks=n. This will override the number of reducers set on the cluster or set in any client-side configuration files.




回答3:


First Priority: Passing configuration parameters through command line (while submitting MR Application)

Second Priority: Setting configuration parameters in application code

Third Priority: It will read default parameters from multiple xml files such as core-site.xml, hadoop-env.sh, hdfs-site.xml, log4j.properties and mapred-site.xml



来源:https://stackoverflow.com/questions/20696449/hadoop-number-of-reducers-configuration-options-priority

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!