How does Hadoop perform input splits?

前端未结

关注

 10  842

礼貌的吻别 2020-11-30 23:18

This is a conceptual question involving Hadoop/HDFS. Lets say you have a file containing 1 billion lines. And for the sake of simplicity, lets consider that each line is of

10条回答

[愿得一人] (楼主)

2020-12-01 00:16

There is a seperate map reduce job that splits the files into blocks. Use FileInputFormat for large files and CombineFileInput Format for smaller ones. You can also check the whether the input can be split into blocks by issplittable method. Each block is then fed to a data node where a map reduce job runs for further analysis. the size of a block would depend on the size that you have mentioned in mapred.max.split.size parameter.

0 讨论(0)

查看其它10个回答
发布评论:

提交评论
- 加载中...