I would like to know why grouping comparator is used in secondary sort of mapreduce.
According to the definitive guide example of secondary sorting
We want t
Let me improve the statement "... take care of the map output keys going to particular reducer".
Reducer Instance vs reduce method: One JVM is created per Reduce task and each of these has a single instance of the Reducer class.This is Reducer instance(I call it Reducer from now).Within each Reducer, reduce method is called multiple times depending on 'key grouping'.Each time reduce is called, 'valuein' has a list of map output values grouped by the key you define in 'grouping comparator'.By default, grouping comparator uses the entire map output key.
In the example, map output key is changed to 'year and temperature' to achieve sorting.Unless you define a grouping comparator that uses only the 'year' part of the map output key,you can't make all records of the same year go to same reduce method call.