Avro vs. Parquet

后端 未结 7 996
自闭症患者
自闭症患者 2020-12-07 09:39

I\'m planning to use one of the hadoop file format for my hadoop related project. I understand parquet is efficient for column based query and avro for full

7条回答
  •  隐瞒了意图╮
    2020-12-07 10:19

    Your understanding is right. In fact, we ran into a similar situation during data migration in our DWH. We chose Parquet over Avro as the disk saving we got was almost double than what we got with AVro. Also, the query processing time was much better than Avro. But yes, our queries were based on aggregation, column based operations etc. hence Parquet was predictably a clear winner.

    We are using Hive 0.12 from CDH distro. You mentioned you are running into issues with Hive+Parquet, what are those? We did not encounter any.

提交回复
热议问题