How to remove duplicate rows from flat file using SSIS?

后端 未结 9 2178
轻奢々
轻奢々 2021-01-12 21:42

Let me first say that being able to take 17 million records from a flat file, pushing to a DB on a remote box and having it take 7 minutes is amazing. SSIS truly is fantasti

9条回答
  •  盖世英雄少女心
    2021-01-12 21:52

    The strategy will usually depend on how many columns the staging table has. The more columns, the more complex the solution. The article you linked has some very good advice.

    The only thing that I will add to what everybody else has said so far, is that columns with date and datetime values will give some of the solutions presented here fits.

    One solution that I came up with is this:

    SET NOCOUNT ON
    
    DECLARE @email varchar(100)
    
    SET @email = ''
    
    SET @emailid = (SELECT min(email) from StagingTable WITH (NOLOCK) WHERE email > @email)
    
    WHILE @emailid IS NOT NULL
    BEGIN
    
        -- Do INSERT statement based on the email
        INSERT StagingTable2 (Email)
        FROM StagingTable WITH (NOLOCK) 
        WHERE email = @email
    
        SET @emailid = (SELECT min(email) from StagingTable WITH (NOLOCK) WHERE email > @email)
    
    END
    

    This is a LOT faster when doing deduping, than a CURSOR and will not peg the server's CPU. To use this, separate each column that comes from the text file into their own variables. Use a separate SELECT statement before and inside the loop, then include them in the INSERT statement. This has worked really well for me.

提交回复
热议问题