How do multiple reducers output only one part-file in Hadoop?

后端 未结 2 1991
Happy的楠姐
Happy的楠姐 2021-01-13 22:22

In my map-reduce job, I use 4 reducers to implement the reducer jobs. So by doing this, the final output will generate 4 part-files.: part-0000 part-0001 part-0002 part-0003

相关标签:
2条回答
  • 2021-01-13 22:52
    MultipleOutputs.addNamedOutput(conf, "text", TextOutputFormat.class,
     LongWritable.class, Text.class);
    

    Here text is output directory or single large file named text ?

    0 讨论(0)
  • 2021-01-13 23:01

    This isn't the behaviour expected from hadoop. But you may use MultipleOutputs to your advantage here. Create one named output and use that in all your reducers to get the final output in one file itself. It's javadoc itself suggest the following:

     JobConf conf = new JobConf();
    
     conf.setInputPath(inDir);
     FileOutputFormat.setOutputPath(conf, outDir);
    
     conf.setMapperClass(MOMap.class);
     conf.setReducerClass(MOReduce.class);
     ...
    
     // Defines additional single text based output 'text' for the job
     MultipleOutputs.addNamedOutput(conf, "text", TextOutputFormat.class,
     LongWritable.class, Text.class);;
     ...
    
     JobClient jc = new JobClient();
     RunningJob job = jc.submitJob(conf);
    
     ...
    

    Job configuration usage pattern is:

    public class MOReduce implements
       Reducer<WritableComparable, Writable> {
     private MultipleOutputs mos;
    
     public void configure(JobConf conf) {
     ...
     mos = new MultipleOutputs(conf);
     }
    
     public void reduce(WritableComparable key, Iterator<Writable> values,
     OutputCollector output, Reporter reporter)
     throws IOException {
     ...
     mos.getCollector("text", reporter).collect(key, new Text("Hello"));
     ...
     }
    
     public void close() throws IOException {
     mos.close();
     ...
     }
    
     }
    

    If you are using the new mapreduce API then see here.

    0 讨论(0)
提交回复
热议问题