Is it possible to disable sorting in hadoop?

问题

My job dosn't require sorting, just aggregation information per key. So I think if it possible to disable sorting of all information in order of increasing performance.

Note: I can't set reducers count to zero because I need to aggregate data between many mappers. I just not interested in sorted result withing one reducer.

回答1:

One of the main purpose to sort the map output is, when the tuples reaches reducer, reducer has to make ) to invoke reducer task, with the sorted map output list it can make the list just by sequential scan (when it sees different key then just make new list), if the map output is not sorted then it has to scan the whole list to form the list with same key.

回答2:

No, Sorting in MapReduce is essentially performed for internal purposes and not for the end results to be sorted.
Sorted input ensures good performance when creating list of values for unique keys, which are fed as Values> arguments when calling the reduce() function.

回答3:

Shuffling and sorting in Hadoop MapReduce are not performed at all if you specify zero reducers (setNumReduceTasks(0)). and The number of reducer can be set to 0 in driver class by job.setNumreduceTasks(0).This shows that there is no reducer phase and has only map phase.It is called as a map-only job.

来源：https://stackoverflow.com/questions/9074910/is-it-possible-to-disable-sorting-in-hadoop

标签

Hadoop

MapReduce

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!