Avro vs. Parquet

后端 未结 7 1004
自闭症患者
自闭症患者 2020-12-07 09:39

I\'m planning to use one of the hadoop file format for my hadoop related project. I understand parquet is efficient for column based query and avro for full

7条回答
  •  不知归路
    2020-12-07 09:57

    Avro

    • Widely used as a serialization platform
    • Row-based, offers a compact and fast binary format
    • Schema is encoded on the file so the data can be untagged
    • Files support block compression and are splittable
    • Supports schema evolution

    Parquet

    • Column-oriented binary file format
    • Uses the record shredding and assembly algorithm described in the Dremel paper
    • Each data file contains the values for a set of rows
    • Efficient in terms of disk I/O when specific columns need to be queried

    From Choosing an HDFS data storage format- Avro vs. Parquet and more

提交回复
热议问题