How does Hadoop perform input splits?

前端未结

关注

 10  853

礼貌的吻别 2020-11-30 23:18

This is a conceptual question involving Hadoop/HDFS. Lets say you have a file containing 1 billion lines. And for the sake of simplicity, lets consider that each line is of

10条回答

刺人心 (楼主)

2020-12-01 00:01

The InputFormat is responsible to provide the splits.

In general, if you have n nodes, the HDFS will distribute the file over all these n nodes. If you start a job, there will be n mappers by default. Thanks to Hadoop, the mapper on a machine will process the part of the data that is stored on this node. I think this is called Rack awareness.

So to make a long story short: Upload the data in the HDFS and start a MR Job. Hadoop will care for the optimised execution.

0 讨论(0)

查看其它10个回答
发布评论:

提交评论
- 加载中...