I was wondering between partitioner and combiner, which runs first?
I was of the opinion it is the partitiner first and then combiner and then the keys are redirecte
Partitioner
runs before Combiner
: MapReduce Comprehensive Diagram.
You can have custom partition logic, and after mapper results are partitioned, the partitions are sorted and Combiner
is applied to the sorted partitions.
See Hadoop MapReduce Comprehensive Description.
I checked it by running a word-count program with custom Combiner
and Partitioner
with timestamps logging:
Apr 23, 2018 2:41:22 PM mapreduce.WordCountPartitioner getPartition
INFO: Partitioner: 1524483682580 : hello : 1
Apr 23, 2018 2:41:22 PM mapreduce.WordCountPartitioner getPartition
INFO: Partitioner: 1524483682582 : hello : 1
Apr 23, 2018 2:41:22 PM mapreduce.WordCountPartitioner getPartition
INFO: Partitioner: 1524483682583 : hello : 1
Apr 23, 2018 2:41:22 PM mapreduce.WordCountPartitioner getPartition
INFO: Partitioner: 1524483682583 : world : 1
Apr 23, 2018 2:41:22 PM mapreduce.WordCountPartitioner getPartition
INFO: Partitioner: 1524483682584 : world : 1
Apr 23, 2018 2:41:22 PM mapreduce.WordCountPartitioner getPartition
INFO: Partitioner: 1524483682585 : hello : 1
Apr 23, 2018 2:41:22 PM mapreduce.WordCountPartitioner getPartition
INFO: Partitioner: 1524483682585 : world : 1
18/04/23 14:41:22 INFO mapred.LocalJobRunner:
18/04/23 14:41:22 INFO mapred.MapTask: Starting flush of map output
18/04/23 14:41:22 INFO mapred.MapTask: Spilling map output
18/04/23 14:41:22 INFO mapred.MapTask: bufstart = 0; bufend = 107; bufvoid = 104857600
18/04/23 14:41:22 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214368(104857472); length = 29/6553600
Apr 23, 2018 2:41:22 PM mapreduce.WordCountCombiner reduce
INFO: Combiner: 1524483682614 : hello
Apr 23, 2018 2:41:22 PM mapreduce.WordCountCombiner reduce
INFO: Combiner: 1524483682615 : world