In my map-reduce job, I use 4 reducers to implement the reducer jobs. So by doing this, the final output will generate 4 part-files.: part-0000 part-0001 part-0002 part-0003
This isn't the behaviour expected from hadoop. But you may use MultipleOutputs to your advantage here.
Create one named output and use that in all your reducers to get the final output in one file itself. It's javadoc itself suggest the following:
JobConf conf = new JobConf();
conf.setInputPath(inDir);
FileOutputFormat.setOutputPath(conf, outDir);
conf.setMapperClass(MOMap.class);
conf.setReducerClass(MOReduce.class);
...
// Defines additional single text based output 'text' for the job
MultipleOutputs.addNamedOutput(conf, "text", TextOutputFormat.class,
LongWritable.class, Text.class);;
...
JobClient jc = new JobClient();
RunningJob job = jc.submitJob(conf);
...
Job configuration usage pattern is:
public class MOReduce implements
Reducer {
private MultipleOutputs mos;
public void configure(JobConf conf) {
...
mos = new MultipleOutputs(conf);
}
public void reduce(WritableComparable key, Iterator values,
OutputCollector output, Reporter reporter)
throws IOException {
...
mos.getCollector("text", reporter).collect(key, new Text("Hello"));
...
}
public void close() throws IOException {
mos.close();
...
}
}
If you are using the new mapreduce API then see here.