Partitioning in hive

三世轮回 提交于 2019-12-10 21:03:00

问题


I'm using static partition in hive to seggregate the data into subdirectories based on date field, I'll need 365 partitions/year for each table(total 14 tables) as I have daily loads into hive.

Is there any limitation on number of static partitions that can be created in hive?

Dynamic partition gives error if "hive.exec.max.dynamic.partitions.pernode" exceeds the specified thresold(100) in sqoop import

I have 5 node HDP cluster out of which 3 are datanodes

Will it hamper performace of cluster if I increase the number of partitions that can be created in hive ?

Is that limitation only for dynamic partition or it is applicable to static as well?

Reference

Check trrouble shooting and best practices section https://cwiki.apache.org/confluence/display/Hive/Tutorial

Kindly suggest


回答1:


For partitioning on date field, the best approach is to partition based on year/month/day.

That said, based on your requirement you should choose your partition strategy. There is no limitation on number of partitions as such unless and until you are over partitioning. which means unnecessarily creating too many partitions and each partition storing very small amount of data.

Regarding the error, you can fix it by increasing the number. You can set hive.exec.max.dynamic.partitions.pernode in hive.

Hope this helps.



来源:https://stackoverflow.com/questions/29103221/partitioning-in-hive

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!