How do multiple reducers output only one part-file in Hadoop?

后端未结

关注

 2  1994

In my map-reduce job, I use 4 reducers to implement the reducer jobs. So by doing this, the final output will generate 4 part-files.: part-0000 part-0001 part-0002 part-0003

相关标签:

2条回答

刺人心

2021-01-13 22:52
```
MultipleOutputs.addNamedOutput(conf, "text", TextOutputFormat.class,
 LongWritable.class, Text.class);
```
Here text is output directory or single large file named text ?
0 讨论(0)
发布评论:

提交评论
- 加载中...

深忆病人

2021-01-13 23:01

This isn't the behaviour expected from hadoop. But you may use MultipleOutputs to your advantage here. Create one named output and use that in all your reducers to get the final output in one file itself. It's javadoc itself suggest the following:

 JobConf conf = new JobConf();

 conf.setInputPath(inDir);
 FileOutputFormat.setOutputPath(conf, outDir);

 conf.setMapperClass(MOMap.class);
 conf.setReducerClass(MOReduce.class);
 ...

 // Defines additional single text based output 'text' for the job
 MultipleOutputs.addNamedOutput(conf, "text", TextOutputFormat.class,
 LongWritable.class, Text.class);;
 ...

 JobClient jc = new JobClient();
 RunningJob job = jc.submitJob(conf);

 ...

Job configuration usage pattern is:

public class MOReduce implements
   Reducer<WritableComparable, Writable> {
 private MultipleOutputs mos;

 public void configure(JobConf conf) {
 ...
 mos = new MultipleOutputs(conf);
 }

 public void reduce(WritableComparable key, Iterator<Writable> values,
 OutputCollector output, Reporter reporter)
 throws IOException {
 ...
 mos.getCollector("text", reporter).collect(key, new Text("Hello"));
 ...
 }

 public void close() throws IOException {
 mos.close();
 ...
 }

 }

If you are using the new mapreduce API then see here.

0 讨论(0)