Hive external table optimal partition size

后端 未结 3 597
迷失自我
迷失自我 2021-01-15 02:32

What is the optimal size for external table partition? I am planning to partition table by year/month/day and we are getting about 2GB of data daily.

3条回答
  •  陌清茗
    陌清茗 (楼主)
    2021-01-15 03:16

    Hive partitioning is most effective in cases where the data is sparse. By sparse I mean that the data internally has visible partitions such as by year, month or day.

    In your case, partitioning by date doesn't make much sense as each day will have 2 Gb of data which is not too big to handle. Partitioning by week or month makes more sense as it will optimize the query time and will not create too many small partition files.

提交回复
热议问题