“Combiner" Class in a mapreduce job

我只是一个虾纸丫 提交于 2019-12-18 12:53:05

问题


A Combiner runs after the Mapper and before the Reducer,it will receive as input all data emitted by the Mapper instances on a given node. then emits output to the Reducers.

And also,If a reduce function is both commutative and associative, then it can be used as a Combiner.

My Question is what does the phrase "commutative and associative" mean in this situation?


回答1:


Assume you have a list of numbers, 1 2 3 4 5 6.

Associative here means you can take your operation and apply it to any subgroup, then apply it to the result of those and get the same answer:

(1) + (2 + 3) + (4 + 5 + 6)
  ==
(1 + 2) + (3 + 4) + (5) + (6)
  ==
...

Think of the parenthesis here as the execution of a combiner.

Commutative means that the order doesn't matter, so:

1 + 2 + 3 + 4 + 5 + 6
  ==
2 + 4 + 6 + 1 + 2 + 3
  ==
...

For example, addition, fits this property, as seen before. "Maximum" fits this property above as well, because the max of maxs is the max. max(a,b) == max(b,a).

Median is an example that doesn't work: the median of medians is not the true median.


Don't forget another important property of a combiner: the input types for the key/value and the output types of the key/value need to be the same. For example, you can't take in a string:int and return a string:float.

Often times, the reducer might output some sort of string instead of numerical value, which may prevent you from just plugging in your reducer as the combiner.




回答2:


For commutativity, let's say your reducer can be represented by a function (in the mathematical term) called f(). Then your reducer is commutative if f(a, b) = f(b, a) For instance:

  • sum(A, B) is the same as sum(B, A)
  • xor(A, B) is the same as xor(B, A)
  • concat(A, B) is not the same as concat(B, A)

For associativity, the property is that f(f(a, b), c) = f(a, f(b, c)). For example:

  • (A + B) + C is the same as A + (B + C)
  • (A - B) - C is not the same as A - (B - C)

So in the context of Map/Reduce, your reducer has to respect these 2 properties. For example, if your reducer is doing just a sum(), or a max(), it respects both properties, but something like mean() or median() does not, and thus you can not use it as a combiner.

I personally see combiners as mini-reducers that run in memory after the map phase as an optimization to reduce network traffic, and the commutativity/associativity actually makes sense if you see a Map/Reduce this way:



来源:https://stackoverflow.com/questions/10220371/combiner-class-in-a-mapreduce-job

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!