What is the use of grouping comparator in hadoop map reduce

后端 未结 4 746
感情败类
感情败类 2020-12-01 00:09

I would like to know why grouping comparator is used in secondary sort of mapreduce.

According to the definitive guide example of secondary sorting

We want t

4条回答
  •  不知归路
    2020-12-01 00:50

    The default partitioner calculates the hash of the key, and those keys which has the same hash value will be sent to the same reducer. If you have a composite(natural+augment) key emitted in your mapper and if you want to send the keys which has the same natural key to the same reducer then you have to implement a custom partitioner.

    public class SimplePartitioner implements Partitioner {
    @Override
    public int getPartition(Text compositeKey, LongWritable value, int numReduceTasks) {
        //Split the key into natural and augment
        String naturalKey = compositeKey.toString().split("separator")
    
    
        return naturalKey.hashCode();
    }
    

    }

    And now if you want all your relevant rows within a partition of data are sent to a single reducer you must also implement a grouping comparator which considers only the natural key

    public class SimpleGroupingComparator extends WritableComparator {
    
    @Override
    public int compare(Text compositeKey1, Text compositeKey2) {
    
    
    return compare(compositeKey1.getNaturalKey(),compositeKey2.getNaturalKey());
    }
    

    }

提交回复
热议问题