How does Hadoop perform input splits?

前端 未结 10 853
礼貌的吻别
礼貌的吻别 2020-11-30 23:18

This is a conceptual question involving Hadoop/HDFS. Lets say you have a file containing 1 billion lines. And for the sake of simplicity, lets consider that each line is of

10条回答
  •  刺人心
    刺人心 (楼主)
    2020-12-01 00:01

    The InputFormat is responsible to provide the splits.

    In general, if you have n nodes, the HDFS will distribute the file over all these n nodes. If you start a job, there will be n mappers by default. Thanks to Hadoop, the mapper on a machine will process the part of the data that is stored on this node. I think this is called Rack awareness.

    So to make a long story short: Upload the data in the HDFS and start a MR Job. Hadoop will care for the optimised execution.

提交回复
热议问题