What are the pros and cons of parquet format compared to other formats?
Characteristics of Apache Parquet are : Self-describing Columnar format Language-independent In comparison to Avro, Sequence Files, RC File etc. I want an overview of the formats. I have already read : How Impala Works with Hadoop File Formats , it gives some insights on the formats but I would like to know how the access to data & storage of data is done in each of these formats. How parquet has an advantage over the others? I think the main difference I can describe relates to record oriented vs. column oriented formats. Record oriented formats are what we're all used to -- text files,