I\'m very much new to MapReduce and I completed a Hadoop word-count example.
In that example it produces unsorted file (with key-value pairs) of word counts. So is i
In Hadoop sorting is done between the Map and the Reduce phases. One approach to sort by word occurance would be to use a custom group comparator that doesn't group anything; therefore, every call to reduce is just the key and one value.
public class Program {
public static void main( String[] args) {
conf.setOutputKeyClass( IntWritable.class);
conf.setOutputValueClass( Text.clss);
conf.setMapperClass( Map.class);
conf.setReducerClass( IdentityReducer.class);
conf.setOutputValueGroupingComparator( GroupComparator.class);
conf.setNumReduceTasks( 1);
JobClient.runJob( conf);
}
}
public class Map extends MapReduceBase implements Mapper {
public void map( Text key, IntWritable value, OutputCollector, Reporter reporter) {
output.collect( value, key);
}
}
public class GroupComaprator extends WritableComparator {
protected GroupComparator() {
super( IntWritable.class, true);
}
public int compare( WritableComparable w1, WritableComparable w2) {
return -1;
}
}