My MapReduce
job processes data by dates and needs to write output to a certain folder structure. Current expectation is to generate out put in following struct
You should not need a second job. I am currently using MultipleOutputs to create a ton of output Directories in one of my programs. Despite there being upwards of 30 directories I am able to use only a couple of MultipleOutputs objects. This is because you can set output directory when you write, so it can be determined only when needed. You only actually need more than one namedOutput if you want to output in different formats (ex. one with key: Text.class, value: Text.class and one with key: Text.class and Value: IntWritable.class)
setup:
MultipleOutputs.addNamedOutput(job, "Output", TextOutputFormat.class, Text.class, Text.class);
setup of reducer:
mout = new MultipleOutputs<Text, Text>(context);
calling mout in reducer:
String key; //set to whatever output key will be
String value; //set to whatever output value will be
String outputFileName; //set to absolute path to file where this should write
mout.write("Output",new Text(key),new Text(value),outputFileName);
you can have a piece of code determine the directory while coding. For example say you want to specify directory by month and year:
int year;//extract year from data
int month;//extract month from data
String baseFileName; //parent directory to all outputs from this job
String outputFileName = baseFileName + "/" + year + "/" + month;
mout.write("Output",new Text(key),new Text(value),outputFileName);
Hope this helps.
EDIT: output file structure for above example:
Base
2013
01
02
03
...
2012
01
...
...
Most probably you missed to close the mos in the cleanup.
If you have a setup in mapper or reducer like below:
public void setup(Context context) {mos = new MultipleOutputs(context);}
you should close mos at the start of your cleanup, like below..
public void cleanup(Context context ) throws IOException, InterruptedException {mos.close();}