Key of object type in the hadoop mapper

笑着哭i 提交于 2019-12-19 09:04:54

问题


New to hadoop and trying to understand the mapreduce wordcount example code from here.

The mapper from documentation is -

Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

I see that in the mapreduce word count example the map code is as follows

public void map(Object key, Text value, Context context)

Question - What is the point of this key of type Object? If the input to a mapper is a text document I am assuming the value in would be the chunk of text (64MB or 128MB) that hadoop has partitioned and stored in HDFS. More generally, what is the use of this input key Keyin to the map code?

Any pointers would be greatly appreciated


回答1:


InputFormat describes the input-specification for a Map-Reduce job.By default, hadoop uses TextInputFormat, which inherits FileInputFormat, to process the input files.

We can also specify the input format to use in the client or driver code:

job.setInputFormatClass(SomeInputFormat.class);

For the TextInputFormat, files are broken into lines. Keys are the position in the file, and values are the line of text.

In the public void map(Object key, Text value, Context context) , key is the line offset and value is the actual text.

Please look at TextInputFormat API https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/input/TextInputFormat.html

By default, Key is LongWritable type and value is of type Text for the TextInputFormat.In your example, Object type is specified in the place of LongWritable as it is compatible. You can also use LongWritable type in the place of Object



来源:https://stackoverflow.com/questions/29063844/key-of-object-type-in-the-hadoop-mapper

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!