Combiner Implementation and internal working

…衆ロ難τιáo~ 提交于 2019-11-28 12:33:31

问题


I want to use a combiner in my MR code say WordCount.

How should I implement it?

What sort of data is being passed to the reducer from the combiner?

It will be great if anyone of you can provide me codes of both Combiner as well as the Reducer.

It will be better if you can explain the way the combiner works

I am new to mapreduce and I am at a learning stage.

Thanks in advance :)


回答1:


A Combiner, also known as a semi-reducer.

The main function of a Combiner is to summarize the map output records with the same key.

The Combiner class is used in between the Map class and the Reduce class to reduce the volume of data transfer between Map and Reduce

Explanation with sample code.

MAP Input:

What do you mean by Object
What do you know about Java
What is Java Virtual Machine
How Java enabled High Performance

MAP output

<What,1> <do,1> <you,1> <mean,1> <by,1> <Object,1>
<What,1> <do,1> <you,1> <know,1> <about,1> <Java,1>
<What,1> <is,1> <Java,1> <Virtual,1> <Machine,1>
<How,1> <Java,1> <enabled,1> <High,1> <Performance,1>

This MAP output will be passed as input to Combiner.

Combiner output

   <What,1,1,1> <do,1,1> <you,1,1> <mean,1> <by,1> <Object,1>
   <know,1> <about,1> <Java,1,1,1>
   <is,1> <Virtual,1> <Machine,1>
   <How,1> <enabled,1> <High,1> <Performance,1>

This combiner output is passed as input to Reducer.

Reducer Output

   <What,3> <do,2> <you,2> <mean,1> <by,1> <Object,1>
   <know,1> <about,1> <Java,3>
   <is,1> <Virtual,1> <Machine,1>
   How,1> <enabled,1> <High,1> <Performance,1> 

If you are using java, below code will set Combiner & Reducer to same class, which is ideal.

  job.setJarByClass(WordCount.class);
  job.setMapperClass(TokenizerMapper.class);
  job.setCombinerClass(IntSumReducer.class);
  job.setReducerClass(IntSumReducer.class);

Have a look at working example in java @tutorialspoint




回答2:


the combiner is doing the same work as reducer ,it can implement the reducer interface and over ride it's reduce method.if you use combiner,smaller amount of the network bandwidth is enough to transfer intermediate (o/p of mapper) to reducer.

you can use the same reduce method (belongs to your own reducer) code in combiner reduce method if your application used in reducer is obey both Commutative and Associative.

there is no rule to execute the Combiner even though you write the combiner for your MR(map reduce) application.to execute the combiner for sure the num of spills should be 3 at least.

for ex my mapper output is ,,,,,<34>. with out combiner ,my input to reducer is . with combiner ,i can pass input to reducer is like ,.




回答3:


Combiner is used in between mapper and reducer to reduce the amount of data transfer between map and reduce phase.

Combiner implementation is similar to reducer implementation. It should implement reduce interface's reduce method. The input and output key value pairs should be similar to that of reducer.

In our driver we can just mention our reducer class as combiner

job.setCombinerClass(MyReducer.class)

Combiners can be used only on functions which are commutative and associative.

For example maximum of numbers

Map 1 output - (23,27, 31) -> Combiner -> 31
Map 2 output - (22,36,33,45) -> Combiner -> 45
Map 3 output - (41,33,15,16) -> Combiner -> 41

Combiner acts on each of the mapper's output.

Combiner output - (31,45,41) ->Reducer -> 45

Amount of data transfer is now reduced which is evident from the example.



来源:https://stackoverflow.com/questions/33406566/combiner-implementation-and-internal-working

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!