How to remove duplicate rows from flat file using SSIS?

后端 未结 9 2170
轻奢々
轻奢々 2021-01-12 21:42

Let me first say that being able to take 17 million records from a flat file, pushing to a DB on a remote box and having it take 7 minutes is amazing. SSIS truly is fantasti

9条回答
  •  庸人自扰
    2021-01-12 22:04

    To do this on the flat file, I use the unix command line tool, sort:

    sort -u inputfile > outputfile
    

    Unfortunately, the windows sort command does not have a unique option, but you could try downloading a sort utility from one of these:

    • http://unxutils.sourceforge.net/
    • http://www.highend3d.com/downloads/tools/os_utils/76.html.

    (I haven't tried them, so no guarantees, I'm afraid).

    On the other hand, to do this as the records are loaded into the database, you could create a unique index on the key the database table whith ignore_dup_key. This will make the records unique very efficiently at load time.

    CREATE UNIQUE INDEX idx1 ON TABLE (col1, col2, ...) WITH IGNORE_DUP_KEY
    

提交回复
热议问题