Sorting large data using MapReduce/Hadoop

后端 未结 6 1198
被撕碎了的回忆
被撕碎了的回忆 2020-12-13 04:02

I am reading about MapReduce and the following thing is confusing me.

Suppose we have a file with 1 million entries(integers) and we want to sort them using MapReduc

6条回答
  •  失恋的感觉
    2020-12-13 04:39

    Check out merge-sort.

    It turns out that sorting partially sorted lists is much more efficient in terms of operations and memory consumption than sorting the complete list.

    If the reducer gets 4 sorted lists it only needs to look for the smallest element of the 4 lists and pick that one. If the number of lists is constant this reducing is an O(N) operation.

    Also typically the reducers are also "distributed" in something like a tree, so the work can be parrallelized too.

提交回复
热议问题