How do multiple reducers output only one part-file in Hadoop?

后端未结

关注

 2  1995

Happy的楠姐 2021-01-13 22:22

In my map-reduce job, I use 4 reducers to implement the reducer jobs. So by doing this, the final output will generate 4 part-files.: part-0000 part-0001 part-0002 part-0003

2条回答

深忆病人 (楼主)

2021-01-13 23:01

This isn't the behaviour expected from hadoop. But you may use MultipleOutputs to your advantage here. Create one named output and use that in all your reducers to get the final output in one file itself. It's javadoc itself suggest the following:

 JobConf conf = new JobConf();

 conf.setInputPath(inDir);
 FileOutputFormat.setOutputPath(conf, outDir);

 conf.setMapperClass(MOMap.class);
 conf.setReducerClass(MOReduce.class);
 ...

 // Defines additional single text based output 'text' for the job
 MultipleOutputs.addNamedOutput(conf, "text", TextOutputFormat.class,
 LongWritable.class, Text.class);;
 ...

 JobClient jc = new JobClient();
 RunningJob job = jc.submitJob(conf);

 ...

Job configuration usage pattern is:

public class MOReduce implements
   Reducer {
 private MultipleOutputs mos;

 public void configure(JobConf conf) {
 ...
 mos = new MultipleOutputs(conf);
 }

 public void reduce(WritableComparable key, Iterator values,
 OutputCollector output, Reporter reporter)
 throws IOException {
 ...
 mos.getCollector("text", reporter).collect(key, new Text("Hello"));
 ...
 }

 public void close() throws IOException {
 mos.close();
 ...
 }

 }

If you are using the new mapreduce API then see here.

0 讨论(0)

查看其它2个回答