Sorted word count using Hadoop MapReduce

后端 未结 4 973
鱼传尺愫
鱼传尺愫 2020-12-16 02:37

I\'m very much new to MapReduce and I completed a Hadoop word-count example.

In that example it produces unsorted file (with key-value pairs) of word counts. So is i

4条回答
  •  无人及你
    2020-12-16 03:12

    In Hadoop sorting is done between the Map and the Reduce phases. One approach to sort by word occurance would be to use a custom group comparator that doesn't group anything; therefore, every call to reduce is just the key and one value.

    public class Program {
       public static void main( String[] args) {
    
          conf.setOutputKeyClass( IntWritable.class);
          conf.setOutputValueClass( Text.clss);
          conf.setMapperClass( Map.class);
          conf.setReducerClass( IdentityReducer.class);
          conf.setOutputValueGroupingComparator( GroupComparator.class);   
          conf.setNumReduceTasks( 1);
          JobClient.runJob( conf);
       }
    }
    
    public class Map extends MapReduceBase implements Mapper {
    
       public void map( Text key, IntWritable value, OutputCollector, Reporter reporter) {
           output.collect( value, key);
       }
    }
    
    public class GroupComaprator extends WritableComparator {
        protected GroupComparator() {
            super( IntWritable.class, true);
        }
    
        public int compare( WritableComparable w1, WritableComparable w2) {
            return -1;
        }
    }
    

提交回复
热议问题