Let me first say that being able to take 17 million records from a flat file, pushing to a DB on a remote box and having it take 7 minutes is amazing. SSIS truly is fantasti
The strategy will usually depend on how many columns the staging table has. The more columns, the more complex the solution. The article you linked has some very good advice.
The only thing that I will add to what everybody else has said so far, is that columns with date and datetime values will give some of the solutions presented here fits.
One solution that I came up with is this:
SET NOCOUNT ON
DECLARE @email varchar(100)
SET @email = ''
SET @emailid = (SELECT min(email) from StagingTable WITH (NOLOCK) WHERE email > @email)
WHILE @emailid IS NOT NULL
BEGIN
-- Do INSERT statement based on the email
INSERT StagingTable2 (Email)
FROM StagingTable WITH (NOLOCK)
WHERE email = @email
SET @emailid = (SELECT min(email) from StagingTable WITH (NOLOCK) WHERE email > @email)
END
This is a LOT faster when doing deduping, than a CURSOR and will not peg the server's CPU. To use this, separate each column that comes from the text file into their own variables. Use a separate SELECT statement before and inside the loop, then include them in the INSERT statement. This has worked really well for me.