InputFormat Decision

我是研究僧i 提交于 2019-12-25 04:08:49

问题


I am trying to figure out which of the given answers suits best the question:

Given a directory of files with the following structure: line number, tab character, string:

Example:

1abialkjfjkaoasdfjksdlkjhqweroij

2kadfjhuwqounahagtnbvaswslmnbfgy

3kjfteiomndscxeqalkzhtopedkfsikj

You want to send each line as one record to your Mapper. Which InputFormat should you use to complete the line: conf.setInputFormat (____.class) ; ?

A. SequenceFileAsTextInputFormat

B. SequenceFileInputFormat

C. KeyValueFileInputFormat

D. BDBInputFormat

My analysis:

Option A is a format I found to exist, but I'm not sure of the correct usage of it and if it suits as an answer.

Option B is not possible since SequenceFiles are file of binary data (K,V) pairs of binary data, and thus will not be suitable..

Option C is not possible because there is no KeyValueFileInputFormat, though here, if it is a typo and it actually is KeyValuetextInputFormat, than I think it will be a good choice. Or isn't it?

Option D is not possible because there is no BDBInputFormat and even if it is a typo and it actually is BDInputFormat than it wouldn't suit the case.

Thank You! D


回答1:


The answer is Option C. It may be a typo

KeyValueTextInputFormat helps you to get line splitted with TAB. So line number will be the key and the string will be the value.




回答2:


It maybe a typo in the option C as you guessed, and it should be https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/KeyValueTextInputFormat.html.

See for more details: How to specify KeyValueTextInputFormat Separator in Hadoop-.20 api?



来源:https://stackoverflow.com/questions/27930385/inputformat-decision

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!