Does sqoop preserves order of imported rows as in Database

非 Y 不嫁゛ 提交于 2019-12-24 00:23:22

问题


I am sqooping a table from oracle database to AWS S3 & then creating a hive table over it.

After importing the data, is the order of records present in database preserved in hive table?

I want to fetch few hundred rows from database as well as hive using java JDBC then compare each row present in ResultSet. Assuming I don't have a primary key, can I compare the rows from both ResultSets as they appear(sequentially, using resultSet.next()) or does the order gets changed due to parallel import?

If order isn't preserved whether ORDER BY is a good option?


回答1:


Order is not preserved during import, also order is not determined when selecting without ORDER BY or DISTRIBUTE+SORT due to parallel select processing.

You need to specify order by when selecting data, does not matter how it was inserted.

ORDER BY orders all data, will work on single reducer, DISTRIBUTE BY + SORT orders per reducer and works in distributed mode.

Also see this answer https://stackoverflow.com/a/40264715/2700344



来源:https://stackoverflow.com/questions/43367933/does-sqoop-preserves-order-of-imported-rows-as-in-database

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!