When do reduce tasks start in Hadoop?

前端 未结 8 849
深忆病人
深忆病人 2020-11-27 10:04

In Hadoop when do reduce tasks start? Do they start after a certain percentage (threshold) of mappers complete? If so, is this threshold fixed? What kind of threshold is typ

8条回答
  •  臣服心动
    2020-11-27 10:24

    Consider a WordCount example in order to understand better how the map reduce task works.Suppose we have a large file, say a novel and our task is to find the number of times each word occurs in the file. Since the file is large, it might be divided into different blocks and replicated in different worker nodes. The word count job is composed of map and reduce tasks. The map task takes as input each block and produces an intermediate key-value pair. In this example, since we are counting the number of occurences of words, the mapper while processing a block would result in intermediate results of the form (word1,count1), (word2,count2) etc. The intermediate results of all the mappers is passed through a shuffle phase which will reorder the intermediate result.

    Assume that our map output from different mappers is of the following form:

    Map 1:- (is,24) (was,32) (and,12)

    Map2 :- (my,12) (is,23) (was,30)

    The map outputs are sorted in such a manner that the same key values are given to the same reducer. Here it would mean that the keys corresponding to is,was etc go the same reducer.It is the reducer which produces the final output,which in this case would be:- (and,12)(is,47)(my,12)(was,62)

提交回复
热议问题