What runs first: the partitioner or the combiner?

后端 未结 8 618
星月不相逢
星月不相逢 2020-12-29 13:21

I was wondering between partitioner and combiner, which runs first?

I was of the opinion it is the partitiner first and then combiner and then the keys are redirecte

8条回答
  •  天涯浪人
    2020-12-29 14:16

    Partition comes first.

    According to "Hadoop, the definitive guide", output of Mapper first writen to memory buffer, then spilled to local dir when buffer is about to overflow. The spilling data is parted according to Partitioner, and in each partition the result is sorted and combined if Combiner given.

    You can simply modify the wordcount MR program to verify it. My result is: ("the quick brown fox jumped over a lazy dog")


    Word, Step, Time

    fox, Mapper, **********754

    fox, Partitioner, **********754

    fox, Combiner, **********850

    fox, Reducer, **********904


    Obviously, Combiner runs after Partitioner.

提交回复
热议问题