Hive scanning entire data for bucketed table

时间秒杀一切 提交于 2019-12-06 12:47:45

问题


I was trying to optimize a hive SQL by bucketing the data on a single column. I created the table with following statement

CREATE TABLE `source_bckt`(
  `uk` string, 
  `data` string)
CLUSTERED BY(uk) SORTED BY(uk) INTO 10 BUCKETS

Then inserted the data after executing "set hive.enforce.bucketing = true;"

When I run the following select "select * from source_bckt where uk='1179724';" Even though the data is supposed to be in a single file which can be identified by the following equation HASH('1179724')%10 the mapreduce spawned scans through the entire set of files.

Any idea?


回答1:


This optimization is not supported yet.
Current JIRA ticket status is PATCH AVAILABLE

https://issues.apache.org/jira/browse/HIVE-5831



来源:https://stackoverflow.com/questions/43608422/hive-scanning-entire-data-for-bucketed-table

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!