I was wondering between partitioner and combiner, which runs first?
I was of the opinion it is the partitiner first and then combiner and then the keys are redirecte
Partition comes first.
According to "Hadoop, the definitive guide", output of Mapper first writen to memory buffer, then spilled to local dir when buffer is about to overflow. The spilling data is parted according to Partitioner, and in each partition the result is sorted and combined if Combiner given.
You can simply modify the wordcount MR program to verify it. My result is: ("the quick brown fox jumped over a lazy dog")
Word, Step, Time
fox, Mapper, **********754
fox, Partitioner, **********754
fox, Combiner, **********850
fox, Reducer, **********904
Obviously, Combiner runs after Partitioner.