问题
I am developing a batch job that loads data into Hive tables from HDFS files. The flow of data is as follows
- Read the file received in HDFS using an external Hive table
- INSERT OVERWRITE the final hive table from the external Hive table applying certain transformations
- Move the received file to Archive
This flow works fine if there is a file in the input directory for the external table to read during step 1. If there is no file, the external table will be empty and as a result executing step 2 will empty the final table. If the external table is empty, I would like to keep the existing data in the final table (the data loaded during the previous execution).
Is there a hive property that I can set so that the final table is overwritten only if we are overwriting it with some data?
I know that I can check if the input file exists using an HDFS command and conditionally launch the Hive requests. But I am wondering if I can achieve the same behavior directly in Hive which would help me avoid this extra verification
回答1:
Try to add dummy partition to your table, say LOAD_TAG and use dynamic partition load:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
INSERT OVERWRITE TABLE your_table PARTITION(LOAD_TAG)
select
col1,
...
colN,
'dummy_value' as LOAD_TAG
from source_table;
The partition value should always be the same in your case.
来源:https://stackoverflow.com/questions/47451261/is-there-a-way-to-prevent-a-hive-table-from-being-overwritten-if-the-select-quer