问题
I noticed that there are two sets of Hadoop configuration parameters: one with mapred.* and the other with mapreduce.. I am guessing these might be due to old API vs. new API but if I am not mistaken, these seem to coexist in the new API. Am I correct? If so, is there a generalized statement what is used for mapred. and what is for mapreduce.*?
回答1:
Examining the source for 0.20.2, there are only a few mapreduce.*
properties, and they revolve around configuring the job input/output format, mapper/combiner/reducer and partitioner classes (they also signal to the job client that the new API is being used by the user - look through the source for o.a.h.mapreduce.Job
, setUseNewAPI()
method)
mapreduce.inputformat.class
mapreduce.outputformat.class
mapreduce.partitioner.class
mapreduce.map.class
mapreduce.combine.class
mapreduce.reduce.class
There are some more properties but they are secondary configuration
The input and output formats, whether it be new or old API versions, typically use mapred.*
properties
For example, the signal your map reduce input paths you use mapred.input.dir
(whether you're using the new or old API). Same for the output property mapred.output.dir
So the long and the short of if is, if there isn't a utility method to configure the property (FileInputFormat.setInputPaths(Job, String)
) then you'll need to check the source
回答2:
Yes mapred library has been deprecated. mapreduce library is new in hadoop 0.20.1..
However, you can still use some of the features offered by mapred, which is why you still find it in the directory.
Please have a look at this link to know what features you can still use: http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/package-summary.html
回答3:
hadoop.mapred
has been deprecated.
Versions before 0.20.1
used mapred
.
Versions after that use mapreduce
.
I do not think they co-exists.
来源:https://stackoverflow.com/questions/10986633/hadoop-configuration-mapred-vs-mapreduce