Hive loading in partitioned table

前端 未结 5 625
长发绾君心
长发绾君心 2020-12-05 05:20

I have a log file in HDFS, values are delimited by comma. For example:

2012-10-11 12:00,opened_browser,userid111,deviceid222

Now I want to load

相关标签:
5条回答
  • 2020-12-05 06:03

    Ning Zhang has a great response on the topic at http://grokbase.com/t/hive/user/114frbfg0y/can-i-use-hive-dynamic-partition-while-loading-data-into-tables.

    The quick context is that:

    1. Load data simply copies data, it doesn't read it so it cannot figure out what to partition
    2. Would suggest that you load data into an intermediate table first (or using an external table pointing to all the files) and then letting partition dynamic insert to kick in to load it into a partitioned table
    0 讨论(0)
  • 2020-12-05 06:03

    I worked this very same scenario, but instead, what we did is create separate HDFS data files for each partition you need to load.

    Since our data is coming from a MapReduce job, we used MultipleOutputs in our Reducer class to multiplex the data into their corresponding partition file. Afterwards, it is just a matter of building the script using the Partition from the HDFS file name.

    0 讨论(0)
  • 2020-12-05 06:08
    CREATE TABLE India (
    
    OFFICE_NAME STRING,
    
    OFFICE_STATUS     STRING,
    
    PINCODE           INT,
    
    TELEPHONE   BIGINT,
    
    TALUK       STRING,
    
    DISTRICT    STRING,
    
    POSTAL_DIVISION   STRING,
    
    POSTAL_REGION     STRING,
    
    POSTAL_CIRCLE     STRING
    
    )
    
    PARTITIONED BY (STATE   STRING)
    
    ROW FORMAT DELIMITED
    
    FIELDS TERMINATED BY ','
    
    STORED AS TEXTFILE;
    

    5. Instruct hive to dynamically load partitions

    SET hive.exec.dynamic.partition = true;
    
    SET hive.exec.dynamic.partition.mode = nonstrict;
    
    0 讨论(0)
  • 2020-12-05 06:10
    1. As mentioned in @Denny Lee's answer, we need to involve a staging table(invites_stg) managed or external and then INSERT from staging table to partitioned table(invites in this case).

    2. Make sure we have these two properties set to:

      SET hive.exec.dynamic.partition=true;
      SET hive.exec.dynamic.partition.mode=nonstrict;
      
    3. And finally insert to invites,

      INSERT OVERWRITE TABLE India PARTITION (STATE) SELECT COL's FROM invites_stg;
      

    Refer this link for help: http://www.edupristine.com/blog/hive-partitions-example

    0 讨论(0)
  • 2020-12-05 06:15

    How about

    LOAD DATA INPATH '/path/to/HDFS/dir/file.csv' OVERWRITE INTO TABLE DB.EXAMPLE_TABLE PARTITION (PARTITION_COL_NAME='PARTITION_VALUE');

    0 讨论(0)
提交回复
热议问题