问题
I created a Hive table with Non-partition table and using select query I inserted data into Partitioned Hive table.
Refered site
- By following above link my partition table contains duplicate values. Below are the setps
This is my Sample employee dataset:link1
I tried the following queries: link2
But after updating a value in Hive table,
Updating salary of Steven with EmployeeID 19 to 50000.
INSERT OVERWRITE TABLE Unm_Parti_Trail PARTITION (Department = 'A') SELECT employeeid,firstname,designation, CASE WHEN employeeid=19 THEN 50000 ELSE salary END AS salary FROM Unm_Parti_Trail;
the values are getting duplicated.
7 Nirmal Tech 12000 A
7 Nirmal Tech 12000 B
Nirmal is placed in Department A only but it is duplicated to department B.
Am I doing anything wrong?
Please suggest.
回答1:
It seems like you forgot the WHERE clause in your last INSERT OVERWRITE:
INSERT INTO TABLE Unm_Parti_Trail PARTITION (Department = 'A')
SELECT employeeid,firstname,designation, CASE WHEN employeeid=19
THEN 50000 ELSE salary END AS salary FROM Unm_Parti_Trail
WHERE department = 'A';
回答2:
One possible solution.
When you do the insert it is necessary to select the partitioning fields as the last ones on the query. Eg:
INSERT INTO TABLE Unm_Parti_Trail PARTITION(department='A')
SELECT EmployeeID, FirstName,Designation,Salary, Department
FROM Unm_Dup_Parti_Trail
WHERE department='A';
See this link for more info.
来源:https://stackoverflow.com/questions/26902998/insert-overwrite-partition-in-hive-table-values-getting-duplicated