问题
Mapper:
The Mapper class is a generic type, with four formal type parameters that specify the input key, input value, output key, and output value types of the map function
public class MaxTemperatureMapper
extends Mapper<LongWritable, Text, Text, IntWritable> {
private static final int MISSING = 9999;
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
String year = line.substring(15, 19);
int airTemperature;
if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs
airTemperature = Integer.parseInt(line.substring(88, 92));
} else {
airTemperature = Integer.parseInt(line.substring(87, 92));
}
String quality = line.substring(92, 93);
if (airTemperature != MISSING && quality.matches("[01459]")) {
context.write(new Text(year), new IntWritable(airTemperature));
}
}
Reducer:
Four formal type parameters are used to specify the input and output types, this time for the reduce function. The input types of the reduce function must match the output types of the map function: Text and IntWritable
public class MaxTemperatureReducer
extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int maxValue = Integer.MIN_VALUE;
for (IntWritable value : values) {
maxValue = Math.max(maxValue, value.get());
}
context.write(key, new IntWritable(maxValue));
}
}
But in this example, key was never used.
What is the use of key in Mapper, which has not been used at all?
Why key is LongWritable ?
回答1:
The input format in this example used is TextInputFormat which produces the key/value pair as LongWritable/Text
.
Here the key LongWritable
represents the offset location of the current line being read from the Input Split
of the given input file. Where the Text
represents the actual current line itself.
We cannot say this line offset value given by the LongWritable
key for every line in the file is not useful. It depends upon the usecases, as per your case this input key is not significant.
Where as we have numerous types of InputFormat
types other than TextInputFormat
which parses the lines from the input file in different ways and produces its relevant key/value pairs.
For example the KeyValueTextInputFormat is a subclass of TextInputFormat
, it parses every line using configures delimiter
and produces the key/value as Text/Text
.
Edit:- Find below the list of few Input formats and key/value types,
KeyValueTextInputFormat Text/Text
NLineInputFormat LongWritable/Text
FixedLengthInputFormat LongWritable/BytesWritable
Other than we have few Input formats which take the Generics-based custom key/value types upon declaration. Such like SequenceFileInputFormat, CombineFileInputFormat
. Kindly give a look to the Input Format chapter in Hadoop definitive guide.
Hope this helps.
回答2:
JobConf class is return LongWritable as default class if you are not setting
job.setMapOutputValueClass(...)
inside JobConf code:-
public Class<?> getOutputKeyClass() {
return getClass(JobContext.OUTPUT_KEY_CLASS,
LongWritable.class, Object.class);
}
来源:https://stackoverflow.com/questions/32650835/why-longwritable-key-has-not-been-used-in-mapper-class