Why LongWritable (key) has not been used in Mapper class?

问题

Mapper:

The Mapper class is a generic type, with four formal type parameters that specify the input key, input value, output key, and output value types of the map function

public class MaxTemperatureMapper
    extends Mapper<LongWritable, Text, Text, IntWritable> {
        private static final int MISSING = 9999;
        @Override
        public void map(LongWritable key, Text value, Context context)
          throws IOException, InterruptedException {
            String line = value.toString();
            String year = line.substring(15, 19);
            int airTemperature;
            if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs
                airTemperature = Integer.parseInt(line.substring(88, 92));
            } else {
                airTemperature = Integer.parseInt(line.substring(87, 92));
        }
        String quality = line.substring(92, 93);
        if (airTemperature != MISSING && quality.matches("[01459]")) {
            context.write(new Text(year), new IntWritable(airTemperature));
        }
    }

Reducer:

Four formal type parameters are used to specify the input and output types, this time for the reduce function. The input types of the reduce function must match the output types of the map function: Text and IntWritable

public class MaxTemperatureReducer
extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
    public void reduce(Text key, Iterable<IntWritable> values, Context context)
    throws IOException, InterruptedException {
        int maxValue = Integer.MIN_VALUE;
        for (IntWritable value : values) {
            maxValue = Math.max(maxValue, value.get());
        }
    context.write(key, new IntWritable(maxValue));
    }
}

But in this example, key was never used.

What is the use of key in Mapper, which has not been used at all?

Why key is LongWritable ?

回答1:

The input format in this example used is TextInputFormat which produces the key/value pair as LongWritable/Text.

Here the key LongWritable represents the offset location of the current line being read from the Input Split of the given input file. Where the Text represents the actual current line itself.

We cannot say this line offset value given by the LongWritable key for every line in the file is not useful. It depends upon the usecases, as per your case this input key is not significant.

Where as we have numerous types of InputFormat types other than TextInputFormat which parses the lines from the input file in different ways and produces its relevant key/value pairs.

For example the KeyValueTextInputFormat is a subclass of TextInputFormat , it parses every line using configures delimiter and produces the key/value as Text/Text.

Edit:- Find below the list of few Input formats and key/value types,

KeyValueTextInputFormat  Text/Text

NLineInputFormat         LongWritable/Text

FixedLengthInputFormat   LongWritable/BytesWritable

Other than we have few Input formats which take the Generics-based custom key/value types upon declaration. Such like SequenceFileInputFormat, CombineFileInputFormat. Kindly give a look to the Input Format chapter in Hadoop definitive guide.

Hope this helps.

回答2:

JobConf class is return LongWritable as default class if you are not setting

job.setMapOutputValueClass(...)

inside JobConf code:-

public Class<?> getOutputKeyClass() {
    return getClass(JobContext.OUTPUT_KEY_CLASS,
                    LongWritable.class, Object.class);
}

来源：https://stackoverflow.com/questions/32650835/why-longwritable-key-has-not-been-used-in-mapper-class

标签

java

Hadoop

MapReduce