How to remove duplicate rows from flat file using SSIS?

后端未结

关注

 9  2178

轻奢々 2021-01-12 21:42

Let me first say that being able to take 17 million records from a flat file, pushing to a DB on a remote box and having it take 7 minutes is amazing. SSIS truly is fantasti

9条回答

盖世英雄少女心 (楼主)

2021-01-12 21:52
The strategy will usually depend on how many columns the staging table has. The more columns, the more complex the solution. The article you linked has some very good advice.

The only thing that I will add to what everybody else has said so far, is that columns with date and datetime values will give some of the solutions presented here fits.

One solution that I came up with is this:
```
SET NOCOUNT ON

DECLARE @email varchar(100)

SET @email = ''

SET @emailid = (SELECT min(email) from StagingTable WITH (NOLOCK) WHERE email > @email)

WHILE @emailid IS NOT NULL
BEGIN

    -- Do INSERT statement based on the email
    INSERT StagingTable2 (Email)
    FROM StagingTable WITH (NOLOCK) 
    WHERE email = @email

    SET @emailid = (SELECT min(email) from StagingTable WITH (NOLOCK) WHERE email > @email)

END
```
This is a LOT faster when doing deduping, than a CURSOR and will not peg the server's CPU. To use this, separate each column that comes from the text file into their own variables. Use a separate SELECT statement before and inside the loop, then include them in the INSERT statement. This has worked really well for me.
0 讨论(0)

查看其它9个回答
发布评论:

提交评论
- 加载中...