How to delete and update a record in Hive

后端 未结 15 1290
梦如初夏
梦如初夏 2020-11-28 19:26

I have installed Hadoop, Hive, Hive JDBC. which are running fine for me. But I still have a problem. How to delete or update a single record using Hive because delete or upd

相关标签:
15条回答
  • 2020-11-28 20:09

    UPDATE or DELETE a record isn't allowed in Hive, but INSERT INTO is acceptable.
    A snippet from Hadoop: The Definitive Guide(3rd edition):

    Updates, transactions, and indexes are mainstays of traditional databases. Yet, until recently, these features have not been considered a part of Hive's feature set. This is because Hive was built to operate over HDFS data using MapReduce, where full-table scans are the norm and a table update is achieved by transforming the data into a new table. For a data warehousing application that runs over large portions of the dataset, this works well.

    Hive doesn't support updates (or deletes), but it does support INSERT INTO, so it is possible to add new rows to an existing table.

    0 讨论(0)
  • 2020-11-28 20:11

    Yes, rightly said. Hive does not support UPDATE option. But the following alternative could be used to achieve the result:

    Update records in a partitioned Hive table:

    1. The main table is assumed to be partitioned by some key.
    2. Load the incremental data (the data to be updated) to a staging table partitioned with the same keys as the main table.
    3. Join the two tables (main & staging tables) using a LEFT OUTER JOIN operation as below:

      insert overwrite table main_table partition (c,d) select t2.a, t2.b, t2.c,t2.d from staging_table t2 left outer join main_table t1 on t1.a=t2.a;

    In the above example, the main_table & the staging_table are partitioned using the (c,d) keys. The tables are joined via a LEFT OUTER JOIN and the result is used to OVERWRITE the partitions in the main_table.

    A similar approach could be used in the case of un-partitioned Hive table UPDATE operations too.

    0 讨论(0)
  • 2020-11-28 20:11

    If you want to delete all records then as a workaround load an empty file into table in OVERWRITE mode

    hive> LOAD DATA LOCAL INPATH '/root/hadoop/textfiles/empty.txt' OVERWRITE INTO TABLE employee;
    Loading data to table default.employee
    Table default.employee stats: [numFiles=1, numRows=0, totalSize=0, rawDataSize=0]
    OK
    Time taken: 0.19 seconds
    
    hive> SELECT * FROM employee;
    OK
    Time taken: 0.052 seconds
    
    0 讨论(0)
提交回复
热议问题