How does Hadoop perform input splits?

前端 未结 10 850
礼貌的吻别
礼貌的吻别 2020-11-30 23:18

This is a conceptual question involving Hadoop/HDFS. Lets say you have a file containing 1 billion lines. And for the sake of simplicity, lets consider that each line is of

10条回答
  •  清歌不尽
    2020-12-01 00:08

    FileInputFormat is the abstract class which defines how the input files are read and spilt up. FileInputFormat provides following functionalites: 1. select files/objects that should be used as input 2. Defines inputsplits that breaks a file into task.

    As per hadoopp basic functionality, if there are n splits then there will be n mapper.

提交回复
热议问题