How does Hadoop perform input splits?

前端 未结 10 855
礼貌的吻别
礼貌的吻别 2020-11-30 23:18

This is a conceptual question involving Hadoop/HDFS. Lets say you have a file containing 1 billion lines. And for the sake of simplicity, lets consider that each line is of

10条回答
  •  不知归路
    2020-12-01 00:16

    FileInputFormat.addInputPath(job, new Path(args[ 0])); or

    conf.setInputFormat(TextInputFormat.class);

    class FileInputFormat funcation addInputPath ,setInputFormat take care of inputsplit, also this code defines the number of mappers get created. we can say inputsplit and number of mappers is directly proportion to number of blocks used for storing input file on HDFS.

    Ex. if we have input file with size 74 Mb , this file stored on HDFS in two blocks (64 MB and 10 Mb). so inputsplit for this file is two and two mapper instances get created for reading this input file.

提交回复
热议问题