问题
My job dosn't require sorting, just aggregation information per key. So I think if it possible to disable sorting of all information in order of increasing performance.
Note: I can't set reducers count to zero because I need to aggregate data between many mappers. I just not interested in sorted result withing one reducer.
回答1:
One of the main purpose to sort the map output is, when the tuples reaches reducer, reducer has to make ) to invoke reducer task, with the sorted map output list it can make the list just by sequential scan (when it sees different key then just make new list), if the map output is not sorted then it has to scan the whole list to form the list with same key.
回答2:
No, Sorting in MapReduce is essentially performed for internal purposes and not for the end results to be sorted.
Sorted input ensures good performance when creating list of values for unique keys, which are fed as Values> arguments when calling the reduce() function.
回答3:
Shuffling and sorting in Hadoop MapReduce are not performed at all if you specify zero reducers (setNumReduceTasks(0)
).
and
The number of reducer can be set to 0 in driver class by job.setNumreduceTasks(0)
.This shows that there is no reducer phase and has only map phase.It is called as a map-only job.
来源:https://stackoverflow.com/questions/9074910/is-it-possible-to-disable-sorting-in-hadoop