Aggregate queries fail in hive if partition directory doesn't exist

故事扮演 提交于 2019-12-12 02:18:20

问题


I am using Hive v1.2.1 with Tez. I have an external partitioned table. The partitions are hourly and of the form p=yyyy_mm_dd_hh. The situation is that these partition directories in hdfs are likely to be deleted sometime. After they are deleted, hive still contains the metadata for that partition, and a command 'show partitions ' would still list the partition whose directory was deleted from hdfs. Normally, this is not likely to cause any problem, and a select query for the partition(whose directory was deleted) would simply result an empty resultset:

hive> select * from test_tab where p='2015_01_01_01';
OK
Time taken: 2.168 seconds

However, on running any aggregate query against the same partition, I get an error:

hive> select count(*) from test_tab where p='2015_01_01_01';
FAILED: SemanticException java.io.FileNotFoundException: File hdfs://localhost:8020/user/root/data/test_db/test_tab/p=2015_01_01_01 does not exist.

I need to have the same behavior in aggregate queries as that in other select queries. This is probably a bug in hive. Any workaround-hints for this issue would be appreciated. Best Regards.


回答1:


run below command

msck repair table test_tab;

and then run your query



来源:https://stackoverflow.com/questions/37456932/aggregate-queries-fail-in-hive-if-partition-directory-doesnt-exist

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!