How does Hadoop perform input splits?

前端未结

关注

 10  852

礼貌的吻别 2020-11-30 23:18

This is a conceptual question involving Hadoop/HDFS. Lets say you have a file containing 1 billion lines. And for the sake of simplicity, lets consider that each line is of

10条回答

旧时难觅i (楼主)

2020-12-01 00:18

Files are split into HDFS blocks and the blocks are replicated. Hadoop assigns a node for a split based on data locality principle. Hadoop will try to execute the mapper on the nodes where the block resides. Because of replication, there are multiple such nodes hosting the same block.

In case the nodes are not available, Hadoop will try to pick a node that is closest to the node that hosts the data block. It could pick another node in the same rack, for example. A node may not be available for various reasons; all the map slots may be under use or the node may simply be down.

0 讨论(0)

查看其它10个回答
发布评论:

提交评论
- 加载中...