How to write 'map only' hadoop jobs?

六眼飞鱼酱① 提交于 2019-12-29 19:03:10

问题


I'm a novice on hadoop, I'm getting familiar to the style of map-reduce programing but now I faced a problem : Sometimes I need only map for a job and I only need the map result directly as output, which means reduce phase is not needed here, how can I achive that?


回答1:


This turns off the reducer.

job.setNumReduceTasks(0);

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setNumReduceTasks(int)




回答2:


You can also use the IdentityReducer:

http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/IdentityReducer.html




回答3:


Can be quite helpful when you need to launch job with mappers only from terminal. You can turn off reducers by specifing 0 reducers in hadoop jar command implicitly:

-D mapred.reduce.tasks=0 

So the result command will be following:

hadoop jar myJob.jar -D mapred.reduce.tasks=0 -input myInputDirs -output myOutputDir

To be backward compatible, Hadoop also supports the "-reduce NONE" option, which is equivalent to "-D mapred.reduce.tasks=0".




回答4:


If you are using oozie as a scheduler to manager your hadoop jobs, then you can just set the property mapred.reduce.tasks(which is the default number of reduce tasks per job) to 0. You can add your mapper in the property mapreduce.map.class, and also there will be no need to add the property mapreduce.reduce.class since reducers are not required.

<configuration>
   <property>
     <name>mapreduce.map.class</name>
     <value>my.com.package.AbcMapper</value>
   </property>
   <property>
     <name>mapred.reduce.tasks</name>
     <value>0</value>
   </property>
   .
   .
   .
<configuration>


来源:https://stackoverflow.com/questions/9394409/how-to-write-map-only-hadoop-jobs

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!