About Hadoop/HDFS file splitting

后端 未结 3 2043
你的背包
你的背包 2020-12-07 23:41

Want to just confirm on following. Please verify if this is correct: 1. As per my understanding when we copy a file into HDFS, that is the point when file (assuming its size

3条回答
  •  星月不相逢
    2020-12-07 23:59

    Your understanding is not ideal. I would point out that there are two, almost independent processes: splitting files into HDFS blocks, and splitting files for processing by the different mappers.
    HDFS split files into blocks based on the defined block size.
    Each input format has its own logic how files can be split into part for the independent processing by different mappers. Default logic of the FileInputFormat is to split file by HDFS blocks. You can implement any other logic
    Compression, usually is a foe of the splitting, so we employ block compression technique to enable splitting of the compressed data. It means that each logical part of the file (block) is compressed independently.

提交回复
热议问题