Combiner Implementation and internal working

前端 未结 3 822
无人共我
无人共我 2020-12-22 07:45

I want to use a combiner in my MR code say WordCount.

How should I implement it?

What sort of data is being passed to the reducer from the combiner?

3条回答
  •  抹茶落季
    2020-12-22 08:06

    Combiner is used in between mapper and reducer to reduce the amount of data transfer between map and reduce phase.

    Combiner implementation is similar to reducer implementation. It should implement reduce interface's reduce method. The input and output key value pairs should be similar to that of reducer.

    In our driver we can just mention our reducer class as combiner

    job.setCombinerClass(MyReducer.class)
    

    Combiners can be used only on functions which are commutative and associative.

    For example maximum of numbers
    
    Map 1 output - (23,27, 31) -> Combiner -> 31
    Map 2 output - (22,36,33,45) -> Combiner -> 45
    Map 3 output - (41,33,15,16) -> Combiner -> 41
    
    Combiner acts on each of the mapper's output.
    
    Combiner output - (31,45,41) ->Reducer -> 45
    

    Amount of data transfer is now reduced which is evident from the example.

提交回复
热议问题