I am running a few tests on the storage formats available with Hive and using Parquet and ORC as major options. I included ORC once with default compression and once with Sn
Both of them have their advantages. We use Parquet at work together with Hive and Impala, but just wanted to point a few advantages of ORC over Parquet: during long-executing queries, when Hive queries ORC tables GC is called about 10 times less frequently. Might be nothing for many projects, but might be crucial for others.
ORC also takes much less time, when you need to select just a few columns from the table. Some other queries, especially with joins, also take less time because of vectorized query execution, which is not available for Parquet
Also, ORC compression is sometimes a bit random, while Parquet compression is much more consistent. It looks like when ORC table has many number columns - it doesn't compress as well. It affects both zlib and snappy compression