Sqoop Incremental Import

后端 未结 8 1718
别那么骄傲
别那么骄傲 2021-01-30 15:27

Need advice on Sqoop Incremental Imports. Say I have a Customer with Policy 1 on Day 1 and I imported those records in HDFS on Day 1 and I see them in Part Files.
On Day 2,

8条回答
  •  無奈伤痛
    2021-01-30 15:56

    In answer to your first question, it depends on how you run the import statement. If you use the --incremental append option, you would be specifying your --check-column and --last-value arguments. These will dictate exactly which records are pulled and they will simply be appended to your table. For example: you could specify a DATE type column for your --check-column argument and a very early date (like '1900-01-01' or Day1 in your case) for --last-value and this would just keep appending everything in the source table (creating duplicate rows) to your destination. In this case, the new part files created will hold both new and old records. You could also use an increasing ID column and keep entering the small ID and that would have the same effect. However, if --last-value is Day2, there will be additional part files with only new records. I'm not sure if you were wondering if you would lose the old records (just in case you were) but that's not the case.

    The last-modified argument for --incremental would only be useful if, in the future, you go back and update some of the attributes of an existing row. In this case, it replaces the old data in your table (and adds the new stuff) with the updated version of the row that's now in your source table. Hope this helps!

    Oh, all of this is based on The Sqoop User Guide Section 7.2.7 https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_incremental_imports

    and Chapter 3 of the Apache Sqoop Cookbook (that chapter is actually fantastic!)

提交回复
热议问题