Removing duplicates from a CTE based on a specific criteria

问题

I want to remove duplicate entries from my CTE based on a certain criteria that is if there are 2 records that have the same email address I want to have only record that has a refduserID with it. The other duplicate record which has 0 refdUserID shoud be removed.

回答1:

The theory being, you could add a second CTE with an extra column. This extra column assigns a row number to every row based on certain criteria - in your case, partitioning by the email address (e.g. what column you want to use for comparison when considering "duplicate" rows), and an arbitrary order by, to decide which one you want to keep (I used referrerid, so the earliest inserted will be kept).

Then in the next "layer" of the CTE, I just pick those which have a row number of 1 - e.g. the top from each "group".

Edit: Updated with actual code as OP posted it...

CREATE PROCEDURE [dbo].[Friendreferralsbyuser]
  @userID INT
AS
    WITH INV ( referrerID, refdEmail, referringTime, referredName, refdUserID )
         AS (SELECT i.userID AS referrerID,
                    i.emailAddress AS refdEmail,
                    i.TIMESTAMP AS referringTime,
                    i.referredName,
                    0 AS refdUserID
             FROM   Invitations AS i
             WHERE  i.userID = @userID),
         INR ( referrerID, refdEmail, joiningtime, referredName, refdUserID )
         AS (SELECT i.referralID AS referrerID,
                    u.email AS refdEmail,
                    i.TIMESTAMP AS joiningtime,
                    u.userName AS referredName,
                    i.userID AS refdUserID
             FROM   InvitationReferrals AS i
                    INNER JOIN Users AS u
                            ON u.userID = i.userID
             WHERE  i.referralID = @userID),
         JOINED ( referrerID, refdEmail, times, referredName, refdUserID )
         AS (SELECT i.referrerID,
                    i.refdEmail,
                    i.referringTime,
                    i.referredName,
                    i.refdUserID
             FROM   INV AS i
             UNION
             SELECT i.referrerID,
                    i.refdEmail,
                    i.joiningtime,
                    i.referredName,
                    i.refdUserID
             FROM   INR AS i),
         ROWNUMBERS (referrerID, refdEmail, times, referredName, refdUserID, RN)
         AS (SELECT referrerID,
                    refdEmail,
                    times,
                    referredName,
                    refdUserID,
                    Row_number()
                      OVER (
                        PARTITION BY refdEmail
                        ORDER BY refdUserID DESC, referrerID))
    SELECT referrerID,
           refdEmail,
           times,
           referredName,
           refdUserID
    FROM   ROWNUMBERS
    WHERE  RN = 1

回答2:

Added this to my Stored procedure and it worked.

DuplicateSorting AS (

SELECT *, rank() OVER

(PARTITION BY refdEmail ORDER BY refduserID desc) AS rn FROM JOINED),

RemovedDuplicates AS ( SELECT * FROM DuplicateSorting WHERE rn = 1 )

SELECT * FROM RemovedDuplicates

回答3:

Try this,

 ;With CTE as
    (
    select *, rank() over (partition by rfdEmail order by refduserID desc) as r
    from myTable
    )
    delete from CTE where r > 1

来源：https://stackoverflow.com/questions/22555408/removing-duplicates-from-a-cte-based-on-a-specific-criteria

标签

sql-server

common-table-expression