问题
What are the differences between Sort Comparator and Group Comparator in Hadoop?
回答1:
To understand GroupComparator, see my answer to this question -
What is the use of grouping comparator in hadoop map reduce
SortComparator:Used to define how map output keys are sorted
Excerpts from the book Hadoop - Definitive Guide:
Sort order for keys is found as follows:
If the property
mapred.output.key.comparator.classis set, either explicitly or by callingsetSortComparatorClass()on Job, then an instance of that class is used. (In the old API the equivalent method issetOutputKeyComparatorClass()onJobConf.)Otherwise, keys must be a subclass of
WritableComparable, and the registered comparator for the key class is used.If there is no registered comparator, then a
RawComparatoris used that deserializes the byte streams being compared into objects and delegates to theWritableComparable’scompareTo()method.
SortComparator Vs GroupComparator in a one liner:
SortComparator decides how map output keys are sorted while GroupComparator decides which map output keys within the Reducer go to the same reduce method call.
回答2:
Group Comparator decides which map output keys will be united(grouped) into one key, and of course all collections of values will be grouped too. Usually it takes a first key as the only one for summary collection.
SortComparator decides how keys will be sorted in input of reduce. By default it uses natural ordering.
来源:https://stackoverflow.com/questions/16184745/what-are-the-differences-between-sort-comparator-and-group-comparator-in-hadoop