Output a list from a Hadoop Map Reduce job using custom writable

后端 未结 1 680
悲哀的现实
悲哀的现实 2020-12-05 03:41

I\'m trying to create a simple map reduce job by changing the wordcount example given by hadoop.

I\'m trying to out put a list instead of a count of the words. The w

相关标签:
1条回答
  • 2020-12-05 04:02

    You have a 'bug' in your reducer - the value iterator re-uses the same IntWritable throughout the loop, so you should wrap the value being added to the list as follows:

    public void reduce(Text key, Iterable<IntWritable> values, Context context)
                                          throws IOException, InterruptedException {
        ArrayList<IntWritable> list = new ArrayList<IntWritable>();    
        for (IntWritable val : values) {
            list.add(new IntWritable(val));
        }
        context.write(key, new MyArrayWritable(IntWritable.class, list.toArray(new IntWritable[list.size()])));
    }
    

    This isn't actually a problem as you're using an array list and your mapper only outputs a single value (one) but is something that may trip you up if you ever extend this code.

    You also need to define in your job that your map and reducer output types are different:

    // map output types
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(IntWritable.class);
    // reducer output types
    
    job.setOutputValueClass(Text.class);
    job.setOutputValueClass(MyArrayWritable.class);
    

    You might want to explicitly define the number of reducers (which may be why you never see your sysouts being written to the task logs, especially if your cluster admin has defined the default number to be 0):

    job.setNumReduceTasks(1);
    

    Your using the default Text output format, which calls toString() on the output key and value pairs - MyArrayWritable doesn't have an overridden toString method so you should put one in your MyArrayWritable:

    @Override
    public String toString() {
      return Arrays.toString(get());
    }
    

    Finally remove the overridden write method from MyArrayWritable - this is not a valid implementation compatible with the complimentary readFields method. you don't need to override this method but if you do (say you want to see a sysout to verify it's being called) then do something like this instead:

    @Override
    public void write(DataOutput arg0) throws IOException {
      System.out.println("write method called");
      super.write(arg0);
    }
    
    0 讨论(0)
提交回复
热议问题