Insert overwrite partition in Hive table - Values getting duplicated

会有一股神秘感。 提交于 2019-12-11 13:44:36

问题


I created a Hive table with Non-partition table and using select query I inserted data into Partitioned Hive table.

Refered site

  1. By following above link my partition table contains duplicate values. Below are the setps

This is my Sample employee dataset:link1

I tried the following queries: link2

But after updating a value in Hive table,

Updating salary of Steven with EmployeeID 19 to 50000.

INSERT OVERWRITE TABLE Unm_Parti_Trail PARTITION (Department = 'A') SELECT employeeid,firstname,designation, CASE WHEN employeeid=19 THEN 50000 ELSE salary END AS salary FROM Unm_Parti_Trail;

the values are getting duplicated.

7       Nirmal  Tech    12000   A
7       Nirmal  Tech    12000   B

Nirmal is placed in Department A only but it is duplicated to department B.

Am I doing anything wrong?

Please suggest.


回答1:


It seems like you forgot the WHERE clause in your last INSERT OVERWRITE:

INSERT INTO TABLE Unm_Parti_Trail PARTITION (Department = 'A') 
SELECT employeeid,firstname,designation, CASE WHEN employeeid=19 
THEN 50000 ELSE salary END AS salary FROM Unm_Parti_Trail 
WHERE department = 'A';



回答2:


One possible solution.

When you do the insert it is necessary to select the partitioning fields as the last ones on the query. Eg:

INSERT INTO TABLE Unm_Parti_Trail PARTITION(department='A') 
SELECT EmployeeID, FirstName,Designation,Salary, Department 
FROM Unm_Dup_Parti_Trail
WHERE department='A';

See this link for more info.



来源:https://stackoverflow.com/questions/26902998/insert-overwrite-partition-in-hive-table-values-getting-duplicated

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!