Hadoop handling data skew in reducer

后端未结

关注

 2  846

Am trying to determine if there are certain hooks available in the hadoop api (hadoop 2.0.0 mrv1) to handle data skew for a reducer. Scenario : Have a custom Composite key a

相关标签:

2条回答

悲哀的现实

2020-12-18 17:44

This idea comes to my mind, I am not sure how good it is.

Lets say you are running the Job with 10 mappers currently, which is failing because of the data skewness. The idea is, you set the number of reducer to 15 and also define what the max number of (key,value) should go to one reducer from each mapper. You keep that information in a hash map in your custom partitioner class. Once a particular reducer reaches the limit, you start sending the next set of (key,value) pairs to another reducer from the extra 5 reducer which we have kept for handling the skewness.

0 讨论(0)
发布评论:

提交评论
- 加载中...
醉酒成梦

2020-12-18 17:46

If you process allow it, The use of a Combiner (reduce-type function) could help you. If you pre-aggregate the data in the Mapper side . Then, even all your data end in the same reducer the amount of data could be manageable.

An alternative could be reimplement the partitioner to avoid the skew case.

0 讨论(0)
发布评论:

提交评论
- 加载中...