Sqoop export inserting duplicate entries

北慕城南 提交于 2019-12-12 03:37:40

问题


I am trying to understand how sqoop export works.I have a table site in mysql which contains two columns id and url and contains two rows

1,www.yahoo.com
2,www.gmail.com

The table has no primary key

When i am exporting the entries from HDFS to mysql site table by executing below command its inserting duplicate entries

I have below entries in HDFS

1,www.one.com
2,www.2.com
3,www.3.com
4,www.4.com

sqoop export --table site --connect jdbc:mysql://localhost/loudacre -- username training --password training --export-dir /site/ --update-mode allowinsert --update-key id

So instead of updating already existing id its inserting duplicate id again (meaning two 1 , 1 for www.one.com and 1 for www.yahoo.com)

even if I remove the --update-key the outcome is same.Does its happening because the table doesn't have primary key

I am using sqoop 1.4.5 in Cloudera quickstart VM

Any help ?


回答1:


As per Sqoop docs,

MySQL will try to insert new row and if the insertion fails with duplicate unique key error it will update appropriate row instead.

So, either --update-key column should be primary key or have unique index on it.


Internally, sqoop will create query like this

INSERT INTO table (id,email) VALUES (1,www.one.com) ON DUPLICATE KEY UPDATE email=www.one.com

and so on for all other values.



来源:https://stackoverflow.com/questions/39137254/sqoop-export-inserting-duplicate-entries

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!