Is there a way to prevent a Hive table from being overwritten if the SELECT query of the INSERT OVERWRITE does not return any results

若如初见. 提交于 2020-02-21 04:27:12

问题


I am developing a batch job that loads data into Hive tables from HDFS files. The flow of data is as follows

  1. Read the file received in HDFS using an external Hive table
  2. INSERT OVERWRITE the final hive table from the external Hive table applying certain transformations
  3. Move the received file to Archive

This flow works fine if there is a file in the input directory for the external table to read during step 1. If there is no file, the external table will be empty and as a result executing step 2 will empty the final table. If the external table is empty, I would like to keep the existing data in the final table (the data loaded during the previous execution).

Is there a hive property that I can set so that the final table is overwritten only if we are overwriting it with some data?

I know that I can check if the input file exists using an HDFS command and conditionally launch the Hive requests. But I am wondering if I can achieve the same behavior directly in Hive which would help me avoid this extra verification


回答1:


Try to add dummy partition to your table, say LOAD_TAG and use dynamic partition load:

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

INSERT OVERWRITE TABLE your_table PARTITION(LOAD_TAG)
select
      col1,
      ...
      colN,
      'dummy_value' as LOAD_TAG
  from source_table;

The partition value should always be the same in your case.



来源:https://stackoverflow.com/questions/47451261/is-there-a-way-to-prevent-a-hive-table-from-being-overwritten-if-the-select-quer

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!