I am reading about MapReduce and the following thing is confusing me.
Suppose we have a file with 1 million entries(integers) and we want to sort them using MapReduc
Check out merge-sort.
It turns out that sorting partially sorted lists is much more efficient in terms of operations and memory consumption than sorting the complete list.
If the reducer gets 4 sorted lists it only needs to look for the smallest element of the 4 lists and pick that one. If the number of lists is constant this reducing is an O(N) operation.
Also typically the reducers are also "distributed" in something like a tree, so the work can be parrallelized too.