I have a log file in HDFS, values are delimited by comma. For example:
2012-10-11 12:00,opened_browser,userid111,deviceid222
Now I want to load
Ning Zhang has a great response on the topic at http://grokbase.com/t/hive/user/114frbfg0y/can-i-use-hive-dynamic-partition-while-loading-data-into-tables.
The quick context is that:
I worked this very same scenario, but instead, what we did is create separate HDFS data files for each partition you need to load.
Since our data is coming from a MapReduce job, we used MultipleOutputs in our Reducer class to multiplex the data into their corresponding partition file. Afterwards, it is just a matter of building the script using the Partition from the HDFS file name.
CREATE TABLE India (
OFFICE_NAME STRING,
OFFICE_STATUS STRING,
PINCODE INT,
TELEPHONE BIGINT,
TALUK STRING,
DISTRICT STRING,
POSTAL_DIVISION STRING,
POSTAL_REGION STRING,
POSTAL_CIRCLE STRING
)
PARTITIONED BY (STATE STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
5. Instruct hive to dynamically load partitions
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
As mentioned in @Denny Lee's answer, we need to involve a staging table(invites_stg) managed or external and then INSERT from staging table to partitioned table(invites in this case).
Make sure we have these two properties set to:
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
And finally insert to invites,
INSERT OVERWRITE TABLE India PARTITION (STATE) SELECT COL's FROM invites_stg;
Refer this link for help: http://www.edupristine.com/blog/hive-partitions-example
How about
LOAD DATA INPATH '/path/to/HDFS/dir/file.csv' OVERWRITE INTO TABLE DB.EXAMPLE_TABLE PARTITION (PARTITION_COL_NAME='PARTITION_VALUE');