Hive loading in partitioned table

喜你入骨 提交于 2019-11-27 19:29:47

Ning Zhang has a great response on the topic at http://grokbase.com/t/hive/user/114frbfg0y/can-i-use-hive-dynamic-partition-while-loading-data-into-tables.

The quick context is that:

  1. Load data simply copies data, it doesn't read it so it cannot figure out what to partition
  2. Would suggest that you load data into an intermediate table first (or using an external table pointing to all the files) and then letting partition dynamic insert to kick in to load it into a partitioned table

I worked this very same scenario, but instead, what we did is create separate HDFS data files for each partition you need to load.

Since our data is coming from a MapReduce job, we used MultipleOutputs in our Reducer class to multiplex the data into their corresponding partition file. Afterwards, it is just a matter of building the script using the Partition from the HDFS file name.

appleboy
  1. As mentioned in @Denny Lee's answer, we need to involve a staging table(invites_stg) managed or external and then INSERT from staging table to partitioned table(invites in this case).

  2. Make sure we have these two properties set to:

    SET hive.exec.dynamic.partition=true;
    SET hive.exec.dynamic.partition.mode=nonstrict;
    
  3. And finally insert to invites,

    INSERT OVERWRITE TABLE India PARTITION (STATE) SELECT COL's FROM invites_stg;
    

Refer this link for help: http://www.edupristine.com/blog/hive-partitions-example

Wamp Institute
CREATE TABLE India (

OFFICE_NAME STRING,

OFFICE_STATUS     STRING,

PINCODE           INT,

TELEPHONE   BIGINT,

TALUK       STRING,

DISTRICT    STRING,

POSTAL_DIVISION   STRING,

POSTAL_REGION     STRING,

POSTAL_CIRCLE     STRING

)

PARTITIONED BY (STATE   STRING)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ','

STORED AS TEXTFILE;

5. Instruct hive to dynamically load partitions

SET hive.exec.dynamic.partition = true;

SET hive.exec.dynamic.partition.mode = nonstrict;

How about

LOAD DATA INPATH '/path/to/HDFS/dir/file.csv' OVERWRITE INTO TABLE DB.EXAMPLE_TABLE PARTITION (PARTITION_COL_NAME='PARTITION_VALUE');

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!