Is there any memory loss in HDFS if we use small files?

前端 未结 3 1550
醉梦人生
醉梦人生 2021-01-03 15:42

I have taken below Quoting from Hadoop - The Definitive Guide: Note, however, that small files do not take up any more disk space than is required to store the raw contents

3条回答
  •  醉酒成梦
    2021-01-03 16:45

    1. See Kumar's Answer
    2. You could look into SequenceFiles or HAR Files depending on your use case. HAR files are analogous to the Tar command. MapReduce can act upon each HAR files with a little overhead. As for SequenceFiles, they are in a way a container of Key/Value pairs. The benefit of this is a Map task can act upon each of these pairs.

    HAR Files

    Sequence Files

    More About Sequence Files

提交回复
热议问题