How to delete and update a record in Hive

后端 未结 15 1289
梦如初夏
梦如初夏 2020-11-28 19:26

I have installed Hadoop, Hive, Hive JDBC. which are running fine for me. But I still have a problem. How to delete or update a single record using Hive because delete or upd

相关标签:
15条回答
  • 2020-11-28 20:02

    There are few properties to set to make a Hive table support ACID properties and to support UPDATE ,INSERT ,and DELETE as in SQL

    Conditions to create a ACID table in Hive. 1. The table should be stored as ORC file .Only ORC format can support ACID prpoperties for now 2. The table must be bucketed

    Properties to set to create ACID table:

    set hive.support.concurrency =true;
    set hive.enforce.bucketing =true;
    set hive.exec.dynamic.partition.mode =nonstrict
    set hive.compactor.initiator.on = true;
    set hive.compactor.worker.threads= 1;
    set hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
    

    set the property hive.in.test to true in hive.site.xml

    After setting all these properties , the table should be created with tblproperty 'transactional' ='true'. The table should be bucketed and saved as orc

    CREATE TABLE table_name (col1 int,col2 string, col3 int) CLUSTERED BY col1 INTO 4 
    BUCKETS STORED AS orc tblproperties('transactional' ='true');
    

    Now the Hive table can support UPDATE and DELETE queries

    0 讨论(0)
  • 2020-11-28 20:04

    The CLI told you where is your mistake : delete WHAT? from student ...

    Delete : How to delete/truncate tables from Hadoop-Hive?

    Update : Update , SET option in Hive

    0 讨论(0)
  • 2020-11-28 20:05

    Recently I was looking to resolve a similar issue, Apache Hive, Hadoop do not support Update/Delete operations. So ? So you have two ways:

    1. Use a backup table: Save the whole table in a backup_table, then truncate your input table, then re-write only the data you are intrested to mantain.
    2. Use Uber Hudi: It's a framework created by Uber to resolve the HDFS limitations including Deletion and Update. You can give a look in this link: https://eng.uber.com/hoodie/

    an example for point 1:

    Create table bck_table like input_table;
    Insert overwrite table bck_table 
        select * from input_table;
    Truncate table input_table;
    Insert overwrite table input_table
        select * from bck_table where id <> 1;
    

    NB: If the input_table is an external table you must follow the following link: How to truncate a partitioned external table in hive?

    0 讨论(0)
  • 2020-11-28 20:06

    Once you have installed and configured Hive , create simple table :

    hive>create table testTable(id int,name string)row format delimited fields terminated by ',';
    

    Then, try to insert few rowsin test table.

    hive>insert into table testTable values (1,'row1'),(2,'row2');
    

    Now try to delete records , you just inserted in table.

    hive>delete from testTable where id = 1;
    

    Error! FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.

    By default transactions are configured to be off. It is been said that update is not supported with the delete operation used in the conversion manager. To support update/delete , you must change following configuration.

    cd  $HIVE_HOME
    vi conf/hive-site.xml
    

    Add below properties to file

    <property>
      <name>hive.support.concurrency</name>
      <value>true</value>
     </property>
     <property>
      <name>hive.enforce.bucketing</name>
      <value>true</value>
     </property>
     <property>
      <name>hive.exec.dynamic.partition.mode</name>
      <value>nonstrict</value>
     </property>
     <property>
      <name>hive.txn.manager</name>
      <value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
     </property>
     <property>
      <name>hive.compactor.initiator.on</name>
      <value>true</value>
     </property>
     <property>
      <name>hive.compactor.worker.threads</name>
      <value>2</value>
     </property>
    

    Restart the service and then try delete command again :

    Error!

    FAILED: LockException [Error 10280]: Error communicating with the metastore.

    There is problem with metastore. In order to use insert/update/delete operation, You need to change following configuration in conf/hive-site.xml as feature is currently in development.

    <property>
      <name>hive.in.test</name>
      <value>true</value>
     </property>
    

    Restart the service and then delete command again :

    hive>delete from testTable where id = 1;
    

    Error!

    FAILED: SemanticException [Error 10297]: Attempt to do update or delete on table default.testTable that does not use an AcidOutputFormat or is not bucketed.

    Only ORC file format is supported in this first release. The feature has been built such that transactions can be used by any storage format that can determine how updates or deletes apply to base records (basically, that has an explicit or implicit row id), but so far the integration work has only been done for ORC.

    Tables must be bucketed to make use of these features. Tables in the same system not using transactions and ACID do not need to be bucketed.

    See below built table example with ORCFileformat, bucket enabled and ('transactional'='true').

    hive>create table testTableNew(id int ,name string ) clustered by (id) into 2 buckets stored as orc TBLPROPERTIES('transactional'='true');
    

    Insert :

    hive>insert into table testTableNew values (1,'row1'),(2,'row2'),(3,'row3');
    

    Update :

    hive>update testTableNew set name = 'updateRow2' where id = 2;
    

    Delete :

    hive>delete from testTableNew where id = 1;
    

    Test :

    hive>select * from testTableNew ;
    
    0 讨论(0)
  • 2020-11-28 20:07

    As of Hive version 0.14.0: INSERT...VALUES, UPDATE, and DELETE are now available with full ACID support.

    INSERT ... VALUES Syntax:

    INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES values_row [, values_row ...]
    

    Where values_row is: ( value [, value ...] ) where a value is either null or any valid SQL literal

    UPDATE Syntax:

    UPDATE tablename SET column = value [, column = value ...] [WHERE expression]
    

    DELETE Syntax:

    DELETE FROM tablename [WHERE expression]
    

    Additionally, from the Hive Transactions doc:

    If a table is to be used in ACID writes (insert, update, delete) then the table property "transactional" must be set on that table, starting with Hive 0.14.0. Without this value, inserts will be done in the old style; updates and deletes will be prohibited.

    Hive DML reference:
    https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
    Hive Transactions reference:
    https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions

    0 讨论(0)
  • 2020-11-28 20:09

    Configuration Values to Set for INSERT, UPDATE, DELETE In addition to the new parameters listed above, some existing parameters need to be set to support INSERT ... VALUES, UPDATE, and DELETE.

    Configuration key Must be set to

    hive.support.concurrency true (default is false) hive.enforce.bucketing true (default is false) (Not required as of Hive 2.0) hive.exec.dynamic.partition.mode nonstrict (default is strict)

    Configuration Values to Set for Compaction

    If the data in your system is not owned by the Hive user (i.e., the user that the Hive metastore runs as), then Hive will need permission to run as the user who owns the data in order to perform compactions. If you have already set up HiveServer2 to impersonate users, then the only additional work to do is assure that Hive has the right to impersonate users from the host running the Hive metastore. This is done by adding the hostname to hadoop.proxyuser.hive.hosts in Hadoop's core-site.xml file. If you have not already done this, then you will need to configure Hive to act as a proxy user. This requires you to set up keytabs for the user running the Hive metastore and add hadoop.proxyuser.hive.hosts and hadoop.proxyuser.hive.groups to Hadoop's core-site.xml file. See the Hadoop documentation on secure mode for your version of Hadoop (e.g., for Hadoop 2.5.1 it is at Hadoop in Secure Mode).

    The UPDATE statement has the following limitations:

    The expression in the WHERE clause must be an expression supported by a Hive SELECT clause.

    Partition and bucket columns cannot be updated.

    Query vectorization is automatically disabled for UPDATE statements. However, updated tables can still be queried using vectorization.

    Subqueries are not allowed on the right side of the SET statement.

    The following example demonstrates the correct usage of this statement:

    UPDATE students SET name = null WHERE gpa <= 1.0;

    DELETE Statement

    Use the DELETE statement to delete data already written to Apache Hive.

    DELETE FROM tablename [WHERE expression];

    The DELETE statement has the following limitation: query vectorization is automatically disabled for the DELETE operation. However, tables with deleted data can still be queried using vectorization.

    The following example demonstrates the correct usage of this statement:

    DELETE FROM students WHERE gpa <= 1,0;

    0 讨论(0)
提交回复
热议问题