Why LongWritable (key) has not been used in Mapper class?

隐身守侯 提交于 2019-12-08 16:03:36

The input format in this example used is TextInputFormat which produces the key/value pair as LongWritable/Text.

Here the key LongWritable represents the offset location of the current line being read from the Input Split of the given input file. Where the Text represents the actual current line itself.

We cannot say this line offset value given by the LongWritable key for every line in the file is not useful. It depends upon the usecases, as per your case this input key is not significant.

Where as we have numerous types of InputFormat types other than TextInputFormat which parses the lines from the input file in different ways and produces its relevant key/value pairs.

For example the KeyValueTextInputFormat is a subclass of TextInputFormat , it parses every line using configures delimiter and produces the key/value as Text/Text.

Edit:- Find below the list of few Input formats and key/value types,

KeyValueTextInputFormat  Text/Text

NLineInputFormat         LongWritable/Text

FixedLengthInputFormat   LongWritable/BytesWritable

Other than we have few Input formats which take the Generics-based custom key/value types upon declaration. Such like SequenceFileInputFormat, CombineFileInputFormat. Kindly give a look to the Input Format chapter in Hadoop definitive guide.

Hope this helps.

Mithlesh Panchal

JobConf class is return LongWritable as default class if you are not setting

job.setMapOutputValueClass(...)

inside JobConf code:-

public Class<?> getOutputKeyClass() {
    return getClass(JobContext.OUTPUT_KEY_CLASS,
                    LongWritable.class, Object.class);
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!