How do I Combine or Merge Small ORC files into Larger ORC file?

前端 未结 2 1768
星月不相逢
星月不相逢 2020-12-03 12:53

Most questions/answers on SO and the web discuss using Hive to combine a bunch of small ORC files into a larger one, however, my ORC files are log files which are separated

2条回答
  •  囚心锁ツ
    2020-12-03 13:43

    You do not need to re-invent the wheel.

    ALTER TABLE table_name [PARTITION partition_spec] CONCATENATE can be used to merge small ORC files into a larger file since Hive 0.14.0. The merge happens at the stripe level, which avoids decompressing and decoding the data. It works fast. I'd suggest to create an external table partitioned by day (partitions are directories), then merge them all specifying PARTITION (day_column) as a partition spec.

    See here: LanguageManual+ORC

提交回复
热议问题