Multiple output path (Java - Hadoop - MapReduce)

廉价感情. 提交于 2019-12-05 02:38:39

问题


I do two MapReduce job, and I want for the second job to be able to write my result into two different files, in two different directories. I would like something similar to FileInputFormat.addInputPath(.., multiple input path) in a sense, but for the output.

I'm completely new to MapReduce, and I have a specificity to write my code in Hadoop 0.21.0 I use context.write(..) in my Reduce step, but I don't see how to control multiple output paths...

Thanks for your time !

My reduceCode from my first job, to show you I only know how to output (it goes into a /../part* file. But now what I would like is to be able to specify two precises files for different output, depending on the key) :

public static class NormalizeReducer extends Reducer<LongWritable, NetflixRating, LongWritable, NetflixUser> {
    public void reduce(LongWritable key, Iterable<NetflixRating> values, Context context) throws IOException, InterruptedException {
        NetflixUser user = new NetflixUser(key.get());
        for(NetflixRating r : values) {
            user.addRating(new NetflixRating(r));
        }
        user.normalizeRatings();
        user.reduceRatings();
        context.write(key, user);
    }
}

EDIT: so I did the method in the last comment as you mentioned, Amar. I don't know if it's works, I have other problem with my HDFS, but before I forget let's put here my discoveries for the sake of civilization :

http://archive.cloudera.com/cdh/3/hadoop-0.20.2+228/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html

  • MultipleOutputs DOES NOT act in place of FormatOutputFormat. You define one output path with FormatOutputFormat, and then you can add many more with multiple MultipleOutputs.
  • addNamedOutput method: String namedOutput is just a word who describe.
  • You define the path actually in the write method, the String baseOutputPath arg.

回答1:


so I did the method in the last comment as you mentioned, Amar. I don't know if it's works, I have other problem with my HDFS, but before I forget let's put here my discoveries for the sake of civilization :

http://archive.cloudera.com/cdh/3/hadoop-0.20.2+228/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html

MultipleOutputs DOES NOT act in place of FormatOutputFormat. You define one output path with FormatOutputFormat, and then you can add many more with multiple MultipleOutputs. addNamedOutput method: String namedOutput is just a word who describe. You define the path actually in the write method, the String baseOutputPath arg.



来源:https://stackoverflow.com/questions/15909897/multiple-output-path-java-hadoop-mapreduce

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!