Hadoop handling data skew in reducer

后端 未结 2 842
[愿得一人]
[愿得一人] 2020-12-18 17:32

Am trying to determine if there are certain hooks available in the hadoop api (hadoop 2.0.0 mrv1) to handle data skew for a reducer. Scenario : Have a custom Composite key a

相关标签:
2条回答
  • 2020-12-18 17:44

    This idea comes to my mind, I am not sure how good it is.

    Lets say you are running the Job with 10 mappers currently, which is failing because of the data skewness. The idea is, you set the number of reducer to 15 and also define what the max number of (key,value) should go to one reducer from each mapper. You keep that information in a hash map in your custom partitioner class. Once a particular reducer reaches the limit, you start sending the next set of (key,value) pairs to another reducer from the extra 5 reducer which we have kept for handling the skewness.

    0 讨论(0)
  • 2020-12-18 17:46

    If you process allow it, The use of a Combiner (reduce-type function) could help you. If you pre-aggregate the data in the Mapper side . Then, even all your data end in the same reducer the amount of data could be manageable.

    An alternative could be reimplement the partitioner to avoid the skew case.

    0 讨论(0)
提交回复
热议问题