What are the main differences between KeyValueTextInputFormat and TextInputFormat in hadoop?

痴心易碎 提交于 2020-01-16 01:13:17

问题


Can somebody give me one practical scenario where we have to use KeyValueTextInputFormat and TextInputFormat??


回答1:


The TextInputFormat class converts every row of the source file into key/value types where the BytesWritable key represents the offset of the record and the Text value represents the entire record itself.

The KeyValueTextInputFormat is an extended version of TextInputFormat , which is useful when we have to fetch every source record as Text/Text pair where the key/value were populated from the record by splitting the record with a fixed delimiter.

Consider the Below file contents,

AL#Alabama
AR#Arkansas
FL#Florida

If TextInputFormat is configured , you might see the key/value pairs as,

0    AL#Alabama
14   AR#Arkansas
23   FL#Florida

if KeyvalueTextInputFormat is configured with conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator", "#") , you might see the results as,

AL    Alabama
AR    Arkansas
FL    Florida



回答2:


keyvaluetextinputformat lets you specify the key from the input file where as textinputfileformat has a fixed key which is the byte offset.

Set the separator for keyvaluetextinputformat using :

    Configuration conf = new Configuration();
conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator", ",");

E.g. of place where you can use keyvaluetextinputformat is :

You get a file which is comma/ some byte separated and you know the first column can act as the key. Lets says CSV of salary with first column as Name/ Employee Id & second column as salary.

Also refer to this post : How to specify KeyValueTextInputFormat Separator



来源:https://stackoverflow.com/questions/29903987/what-are-the-main-differences-between-keyvaluetextinputformat-and-textinputforma

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!