Finding next row in SQL query and deleting it only if previous row matches

问题

I have a table like this.

|-DT--------- |-ID------|
|5/30 12:00pm |10       |
|5/30 01:00pm |30       |
|5/30 02:30pm |30       |
|5/30 03:00pm |50       |
|5/30 04:30pm |10       |
|5/30 05:00pm |10       |
|5/30 06:30pm |10       |
|5/30 07:30pm |10       |
|5/30 08:00pm |50       |
|5/30 09:30pm |10       |

I want to remove any duplicate rows only if the previous row has the same ID as the following row. I want to keep the duplicate row with the datetime furthest in the future. For example the above table would look like this.

|-DT--------- |-ID------|
|5/30 12:00pm |10       |
|5/30 02:30pm |30       |
|5/30 03:00pm |50       |
|5/30 07:30pm |10       |
|5/30 08:00pm |50       |
|5/30 09:30pm |10       |

Can I get any tips on how this can be done?

回答1:

with C as
(
  select ID,
         row_number() over(order by DT) as rn
  from YourTable
)
delete C1
from C as C1
  inner join C as C2
    on C1.rn = C2.rn-1 and
       C1.ID = C2.ID

SE-Data

回答2:

Do these 3 steps: http://www.sqlfiddle.com/#!3/b58b9/19

First make the rows sequential:

with a as
(
  select dt, id, row_number() over(order by dt) as rn
  from tbl
)
select * from a;

Output:

|                         DT | ID | RN |
----------------------------------------
| May, 30 2012 12:00:00-0700 | 10 |  1 |
| May, 30 2012 13:00:00-0700 | 30 |  2 |
| May, 30 2012 14:30:00-0700 | 30 |  3 |
| May, 30 2012 15:00:00-0700 | 50 |  4 |
| May, 30 2012 16:30:00-0700 | 10 |  5 |
| May, 30 2012 17:00:00-0700 | 10 |  6 |
| May, 30 2012 18:30:00-0700 | 10 |  7 |
| May, 30 2012 19:30:00-0700 | 10 |  8 |
| May, 30 2012 20:00:00-0700 | 50 |  9 |
| May, 30 2012 21:30:00-0700 | 10 | 10 |

Second, using the sequential numbers, we can find which rows are at the bottom (and also those not at the bottom for that matter):

with a as
(
  select dt, id, row_number() over(order by dt) as rn
  from tbl
)
select below.*, 
    case when above.id <> below.id or above.id is null then 
        1 
    else 
        0 
    end as is_at_bottom
from a below
left join a above on above.rn + 1 = below.rn;

Output:

|                         DT | ID | RN | IS_AT_BOTTOM |
-------------------------------------------------------
| May, 30 2012 12:00:00-0700 | 10 |  1 |            1 |
| May, 30 2012 13:00:00-0700 | 30 |  2 |            1 |
| May, 30 2012 14:30:00-0700 | 30 |  3 |            0 |
| May, 30 2012 15:00:00-0700 | 50 |  4 |            1 |
| May, 30 2012 16:30:00-0700 | 10 |  5 |            1 |
| May, 30 2012 17:00:00-0700 | 10 |  6 |            0 |
| May, 30 2012 18:30:00-0700 | 10 |  7 |            0 |
| May, 30 2012 19:30:00-0700 | 10 |  8 |            0 |
| May, 30 2012 20:00:00-0700 | 50 |  9 |            1 |
| May, 30 2012 21:30:00-0700 | 10 | 10 |            1 |

Third, delete all rows not at the bottom:

with a as
(
  select dt, id, row_number() over(order by dt) as rn
  from tbl
)
,b as 
(
  select below.*, 
       case when above.id <> below.id or above.id is null then 
           1 
       else 
           0 
       end as is_at_bottom
  from a below
  left join a above on above.rn + 1 = below.rn
)
delete a
from a
inner join b on b.rn = a.rn
where b.is_at_bottom = 0;

To verify:

select * from tbl order by dt;

Output:

|                         DT | ID |
-----------------------------------
| May, 30 2012 12:00:00-0700 | 10 |
| May, 30 2012 13:00:00-0700 | 30 |
| May, 30 2012 15:00:00-0700 | 50 |
| May, 30 2012 16:30:00-0700 | 10 |
| May, 30 2012 20:00:00-0700 | 50 |
| May, 30 2012 21:30:00-0700 | 10 |

You can also simplify the deletion to this: http://www.sqlfiddle.com/#!3/b58b9/20

with a as
(
  select dt, id, row_number() over(order by dt, id) as rn
  from tbl
)
delete above
from a below
left join a above on above.rn + 1 = below.rn
where case when above.id <> below.id or above.id is null then 1 else 0 end = 0;

Mikael Eriksson's answer is the best though, if I simplify again my simplified query, it will look like his answer ツ For that, I +1'd his answer. I will just make his query a bit more readable though; by swapping the joining order and giving good aliases.

with a as
(
  select *, row_number() over(order by dt, id) as rn
  from tbl
)
delete above

from a below
join a above on above.rn + 1 = below.rn and above.id = below.id;

Live test: http://www.sqlfiddle.com/#!3/b58b9/24

回答3:

Here you go, simply replace [Table] with the name of your table.

SELECT * 
FROM [dbo].[Table]
WHERE [Ident] NOT IN 
(
    SELECT Extent.[Ident]
    FROM 
    (
        SELECT  TOP 100 PERCENT T1.[DT], 
                T1.[ID],
                T1.[Ident],
                (
                    SELECT TOP 1 Previous.ID
                    FROM [dbo].[Table] AS Previous
                    WHERE Previous.[Ident] > T1.Ident -- this is where the identity seed is important
                    ORDER BY [Ident] ASC
                ) AS 'PreviousId'
        FROM [dbo].[Table] AS T1
        ORDER BY T1.[Ident] DESC
    ) AS Extent
    WHERE [Id] = [PreviousId]
)

Note: You will need an indentity column on the table - use a CTE if you can't change the structure of the table.

回答4:

You can try following Query ...

select * from 
(
    select *,RANK() OVER (ORDER BY dt,id) AS Rank  from test
) as a
where 0 = (
select count(id) from (
select id, RANK() OVER (ORDER BY dt,id) AS Rank  from test
)as b where b.id = a.id and b.Rank = a.Rank + 1 

) order by dt

Thanks, Mahesh

来源：https://stackoverflow.com/questions/11589499/finding-next-row-in-sql-query-and-deleting-it-only-if-previous-row-matches

标签

sql

sql-server-2008

tsql