Want to just confirm on following. Please verify if this is correct: 1. As per my understanding when we copy a file into HDFS, that is the point when file (assuming its size
Yes, file contents are split into chunks when the file is copied into the HDFS. The block size is configurable, and if it is say 128 MB, then whole 128 MB would be one block, not 2 blocks of 64 MB separately.Also it is not necessary that each chunk of a file is stored on a separate datanode.A datanode may have more than one chunk of a particular file.And a particular chunk may be present in more than one datanodes based upon the replication factor.