How does Hadoop perform input splits?

前端未结

关注

 10  845

礼貌的吻别 2020-11-30 23:18

This is a conceptual question involving Hadoop/HDFS. Lets say you have a file containing 1 billion lines. And for the sake of simplicity, lets consider that each line is of

10条回答

时光取名叫无心 (楼主)

2020-12-01 00:06

I think what Deepak was asking was more about how the input for each call of the map function is determined, rather than the data present on each map node. I am saying this based on the second part of the question: More specifically, each time the map() function is called what are its Key key and Value val parameters?

Actually, the same question brought me here, and had i been an experienced hadoop developer, i may have interpreted it like the answers above.

To answer the question,

the file at a given map node is split, based on the value we set for InputFormat. (this is done in java using setInputFormat()! )

An example:

conf.setInputFormat(TextInputFormat.class); Here, by passing TextInputFormat to the setInputFormat function, we are telling hadoop to treat each line of the input file at the map node as the input to the map function. Linefeed or carriage-return are used to signal end of line. more info at TextInputFormat!

In this example: Keys are the position in the file, and values are the line of text.

Hope this helps.

0 讨论(0)

查看其它10个回答
发布评论:

提交评论
- 加载中...