问题
I am trying to figure out which of the given answers suits best the question:
Given a directory of files with the following structure: line number, tab character, string:
Example:
1abialkjfjkaoasdfjksdlkjhqweroij
2kadfjhuwqounahagtnbvaswslmnbfgy
3kjfteiomndscxeqalkzhtopedkfsikj
You want to send each line as one record to your Mapper. Which InputFormat should you use to complete the line: conf.setInputFormat (____.class) ; ?
A. SequenceFileAsTextInputFormat
B. SequenceFileInputFormat
C. KeyValueFileInputFormat
D. BDBInputFormat
My analysis:
Option A is a format I found to exist, but I'm not sure of the correct usage of it and if it suits as an answer.
Option B is not possible since SequenceFiles are file of binary data (K,V) pairs of binary data, and thus will not be suitable..
Option C is not possible because there is no KeyValueFileInputFormat, though here, if it is a typo and it actually is KeyValuetextInputFormat, than I think it will be a good choice. Or isn't it?
Option D is not possible because there is no BDBInputFormat and even if it is a typo and it actually is BDInputFormat than it wouldn't suit the case.
Thank You! D
回答1:
The answer is Option C. It may be a typo
KeyValueTextInputFormat
helps you to get line splitted with TAB.
So line number will be the key and the string will be the value.
回答2:
It maybe a typo in the option C as you guessed, and it should be https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/KeyValueTextInputFormat.html.
See for more details: How to specify KeyValueTextInputFormat Separator in Hadoop-.20 api?
来源:https://stackoverflow.com/questions/27930385/inputformat-decision