Hive - How to efficiently Create Table As Select?

大憨熊 提交于 2019-12-12 03:43:24

问题


I have a hive table, htable that's partitioned on foo and bar. I want to create a small subset of this table for experiments, so I would think the thing to do would be

create table new_table like htable;

insert into new_table partition (foo, bar) select * from htable
where rand() < 0.01 and foo in (a,b)

This takes forever however and finally fails with a java.lang.OutOfMemoryError: Java heap space. Is there a better way?


回答1:


Add distribute by foo, bar:

    insert into new_table partition (foo, bar) select * from htable
     where rand() < 0.01 and foo in (a,b) 
    distribute by foo, bar

this will reduce memory consumption.



来源:https://stackoverflow.com/questions/39272906/hive-how-to-efficiently-create-table-as-select

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!