How updating data in hive transaction tables result in file creation/updation of files in HDFS

时光怂恿深爱的人放手 提交于 2019-12-11 08:53:53

问题


By enabling transactions in Hive, we can update records. Assuming I'm using AVRO format for my hive table.

https://hortonworks.com/hadoop-tutorial/using-hive-acid-transactions-insert-update-delete-data/

How does hive takes care of updating an AVRO file and replicating them again on different server ( coz replication factor is 3 ).

I could not find a good article which explains this, and the consequence of using ACID in Hive. Since HDFS is recommended for non-updating or append only files, how does this updating a record in between works.

Please advise.


回答1:


Data for the table is stored in a set of base files. New records, updates, and deletes are stored in delta files. A new set of delta files is created for each transaction (or in the case of streaming agents such as Flume or Storm, each batch of transactions) that alters a table. At read time the reader merges the base and delta files, applying any updates and deletes as it reads.

Subsequently, the major compaction merges the larger delta files and/or base file into another base file on periodic interval of time that would speed up the further table scan operation.

Inserted/updated/deleted data are periodically compacted to save space and optimize data access.

The ACID Transaction feature currently has these limitations:

  1. It only works for ORC file. There is a JIRA in open source to add support for Parquet tables.
  2. It works only for non-sorted bucketed tables.
  3. INSERT OVERWRITE is not supported for transactions.
  4. It does not support for BEGIN, COMMIT, or ROLLBACK Transactions.
  5. It is not recommended for OLTP.

ACID doesn't support with AVRO file and HDFS block replacement policies are same for ACID tables too.

Below link can be more helpful to understand ACID tables in Hive.

http://docs.qubole.com/en/latest/user-guide/hive/use-hive-acid.html

https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions



来源:https://stackoverflow.com/questions/44138131/how-updating-data-in-hive-transaction-tables-result-in-file-creation-updation-of

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!