I would like to know why grouping comparator is used in secondary sort of mapreduce.
According to the definitive guide example of secondary sorting
We want t
The default partitioner calculates the hash of the key, and those keys which has the same hash value will be sent to the same reducer. If you have a composite(natural+augment) key emitted in your mapper and if you want to send the keys which has the same natural key to the same reducer then you have to implement a custom partitioner.
public class SimplePartitioner implements Partitioner {
@Override
public int getPartition(Text compositeKey, LongWritable value, int numReduceTasks) {
//Split the key into natural and augment
String naturalKey = compositeKey.toString().split("separator")
return naturalKey.hashCode();
}
}
And now if you want all your relevant rows within a partition of data are sent to a single reducer you must also implement a grouping comparator which considers only the natural key
public class SimpleGroupingComparator extends WritableComparator {
@Override
public int compare(Text compositeKey1, Text compositeKey2) {
return compare(compositeKey1.getNaturalKey(),compositeKey2.getNaturalKey());
}
}