InputFormat Decision | 易学教程

问题

I am trying to figure out which of the given answers suits best the question:

Given a directory of files with the following structure: line number, tab character, string:

Example:

1abialkjfjkaoasdfjksdlkjhqweroij

2kadfjhuwqounahagtnbvaswslmnbfgy

3kjfteiomndscxeqalkzhtopedkfsikj

You want to send each line as one record to your Mapper. Which InputFormat should you use to complete the line: conf.setInputFormat (____.class) ; ?

A. SequenceFileAsTextInputFormat

B. SequenceFileInputFormat

C. KeyValueFileInputFormat

D. BDBInputFormat

My analysis:

Option A is a format I found to exist, but I'm not sure of the correct usage of it and if it suits as an answer.

Option B is not possible since SequenceFiles are file of binary data (K,V) pairs of binary data, and thus will not be suitable..

Option C is not possible because there is no KeyValueFileInputFormat, though here, if it is a typo and it actually is KeyValuetextInputFormat, than I think it will be a good choice. Or isn't it?

Option D is not possible because there is no BDBInputFormat and even if it is a typo and it actually is BDInputFormat than it wouldn't suit the case.

Thank You! D

回答1:

The answer is Option C. It may be a typo

KeyValueTextInputFormat helps you to get line splitted with TAB. So line number will be the key and the string will be the value.

回答2:

It maybe a typo in the option C as you guessed, and it should be https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/KeyValueTextInputFormat.html.

See for more details: How to specify KeyValueTextInputFormat Separator in Hadoop-.20 api?

来源：https://stackoverflow.com/questions/27930385/inputformat-decision

标签

Hadoop

MapReduce