Is it better to use the mapred or the mapreduce package to create a Hadoop Job?

佐手、 提交于 2019-12-17 04:25:14

问题


To create MapReduce jobs you can either use the old org.apache.hadoop.mapred package or the newer org.apache.hadoop.mapreduce package for Mappers and Reducers, Jobs ... The first one had been marked as deprecated but this got reverted meanwhile. Now I wonder whether it is better to use the old mapred package or the new mapreduce package to create a job and why. Or is it just dependent on whether you need stuff like the MultipleTextOutputFormat which is only available in the old mapred package?


回答1:


Functionality wise there is not much difference between the old (o.a.h.mapred) and the new (o.a.h.mapreduce) API. The only significant difference is that records are pushed to the mapper/reducer in the old API. While the new API supports both pull/push mechanism. You can get more information about the pull mechanism here.

Also, the old API has been un-deprecated since 0.21. You can find more information about the new API here.

As you mentioned some of the classes (like MultipleTextOutputFormat) have not been migrated to the new API, due to this and the above mentioned reason it's better to stick to the old API (although a translation is usually quite simple).




回答2:


Both the old and new APIs are good. The new API is cleaner though. Use the new API wherever you can, and use the old one wherever you need specific classes that are not present in the new API (like MultipleTextOutputFormat)

But do take care not to use a mix of the old and new APIs in the same Mapreduce job. That leads to weird problems.




回答3:


Old API (mapred)

  1. Exists in Package org.apache.hadoop.mapred

  2. Provide A map/reduce job configuration.

  3. Reduces values for a given key, based on the Iterator
  4. Package Summary

New API (mapreduce)

  1. Exists in Package org.apache.hadoop.mapreduce

  2. Job configuration is done by separate class, Called JobConf which is extension of Configuration
    Class

  3. Reduces values for a given key, based on the Iterable

  4. Package Summary



来源:https://stackoverflow.com/questions/7598422/is-it-better-to-use-the-mapred-or-the-mapreduce-package-to-create-a-hadoop-job

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!