Adding partitions to the external table in hive takes a lot of time

倾然丶 夕夏残阳落幕 提交于 2020-07-21 07:25:05

问题


I would like to know what is the best possible way(s) of adding partitions to the external table. I have a external table on S3 in hive with the partition as vehicle=/date=/hr=


Now new vehicle can be added at any time of day and there will be vehicles which will not have data for a couple of hours in a day or for couple of days.

Few possible solutions - msck reapir table : It takes a lot of time - Add partition via script : I may not know when new vehicle gets created or which hour data is not there for a vehicle

How do generally people solve this problem of adding partitions to the external tables


回答1:


msck reapir table is a right way to do this. If it runs too slow, try to switch off stats autogather before repair table:

set hive.stats.autogather=false;

You can enable it again after recovering partitions.

Most probably you are hitting HIVE-18743 or related bug. In my case this helped.



来源:https://stackoverflow.com/questions/57882477/adding-partitions-to-the-external-table-in-hive-takes-a-lot-of-time

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!