Separate output files in hadoop mapreduce

社会主义新天地 提交于 2019-12-23 05:23:12

问题


My question has probably already been asked but I can not find a clear answer to my question.

My MapReduce is a basic WordCount. My current output file is :

// filename : 'part-r-00000'
789  a
755  #c   
456  d
123  #b

How can I change the ouput filename ?

Then, is-it possible to have 2 output files :

// First output file
789  a
456  d

// Second output file
123  #b
755  #c

Here's my reduce class :

public static class SortReducer extends Reducer<IntWritable, Text, IntWritable, Text> {

    public void reduce(IntWritable key, Text value, Context context) throws IOException, InterruptedException {

        context.write(key, value);

    }
}

Here's my Partitionner Class :

public class TweetPartitionner extends Partitioner<Text, IntWritable>{

    @Override
    public int getPartition(Text a_key, IntWritable a_value, int a_nbPartitions) {
        if(a_key.toString().startsWith("#"))
            return 1;
        return 0;
    }


}

Thanks a lot !


回答1:


In your job file set

job.setNumReduceTasks(2);

From mapper emit

a    789
#c   755     
d    456  
#b   123 

write a partitioner, add partitioner to job config, In partitioner check if key starts with # return 1 else 0

in reducer swap key and value




回答2:


To your other question on how to change the output file name , you can have a look at http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html#write(java.lang.String, K, V).



来源:https://stackoverflow.com/questions/17293886/separate-output-files-in-hadoop-mapreduce

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!