Deleting Invalid Duplicate Rows in SQL

问题

I have a table which stores the check-in times of employees through Time Machine on the basis of a username. If an employee punches multiple times then there would be multiple records of his check-ins which would only have a time difference of few seconds in between. Obviously only the first record is valid. All the other entries are invalid and must be deleted from the Table. How can i do it if i can select all the checkin records of an employee for the current date?

The Data in the db is as follows.

Username               Checktime                       CheckType
 HRA001            7/29/2012 8:16:44 AM                Check-In
 HRA001            7/29/2012 8:16:46 AM                Check-In
 HRA001            7/29/2012 8:16:50 AM                Check-In 
 HRA001            7/29/2012 8:16:53 AM                Check-In

回答1:

Try this:

 ;WITH users_CTE as (
 select rank() over (partition by Username order by Checktime) as rnk from users
 )

 DELETE FROM users_CTE where rnk <> 1

--For your second requirement try this query

 ;WITH users_CTE as (
 select *,rank() over (partition by Username order by Checktime) as rnk from users
 )
,CTE2 as (select Username,MIN(CheckTime) as minTime,DATEADD(mi,1,MIN(CheckTime)) as maxTime from users_CTE 
 group by Username)



delete from users where Checktime in(
select c1.Checktime from users_CTE c1 left join CTE2 c2
on c1.Checktime > c2.minTime and c1.Checktime <= c2.maxTime
where c2.Username is not null and c1.Username in(

select c1.Username from users_CTE c1 left join CTE2 c2
on c1.Checktime > c2.minTime and c1.Checktime <= c2.maxTime
group by c1.Username,c2.Username 
having COUNT(*) > 1))

--For your changed requirements pls check this query below

alter table users add flag varchar(2)

;WITH users_CTE as (
 select *,rank() over (partition by Username order by Checktime) as rnk from users
 )
,CTE2 as (select Username,MIN(CheckTime) as minTime,DATEADD(mi,1,MIN(CheckTime)) as maxTime from users_CTE 
 group by Username)


update u SET u.flag = 'd' from users_CTE u inner join (
select c1.Checktime from users_CTE c1 left join CTE2 c2
on c1.Checktime > c2.minTime and c1.Checktime <= c2.maxTime
where c2.Username is not null and c1.Username in(

select c1.Username from users_CTE c1 left join CTE2 c2
on c1.Checktime > c2.minTime and c1.Checktime <= c2.maxTime
group by c1.Username,c2.Username 
having COUNT(*) > 1)) a
on u.Checktime=a.Checktime

--Check the latest query with DeletFlag

;WITH users_CTE as 
(
 select *,row_number() over (partition by Username order by Checktime) as row from users
)
,CTE as(
select row,Username,Checktime,CheckType,0 as totalSeconds,'N' as Delflag from users_CTE where row=1 
union all
select t.row,t.Username,t.Checktime,t.CheckType,CASE WHEN (c.totalSeconds + DATEDIFF(SECOND,c.Checktime,t.Checktime))  >= 60 then 0 else (c.totalSeconds + DATEDIFF(SECOND,c.Checktime,t.Checktime)) end as totalSeconds,
CASE WHEN (c.totalSeconds + DATEDIFF(SECOND,c.Checktime,t.Checktime))  >= 60 then 'N' else 'Y' end as Delflag
--CASE WHEN c.totalSeconds <= 60  then 'Y' else 'N' end as Delflag
from users_CTE t inner join CTE c
on t.row=c.row+1
)

select Username,Checktime,CheckType,Delflag from CTE

回答2:

Why don't you verify the check-ins before inserting them into db. If there exists any check-in for this user, between this date and that date then do nothing else insert it

回答3:

You should be able to order all records by time, subtract the latest time from the previous time per employee and, if the result is less than a certain threshold, delete the row(s) with the most recent time.

回答4:

You could try and RANK the records by checkin time and then delete all the records for each employee for each day which have RANK greater than 1.

回答5:

Try this query: Delete from employee where employee.checkin in (select checkin from employee where count(checkin)>1);

回答6:

http://codesimplified.com/2010/10/18/remove-duplicate-records-from-the-database-table/

Hope this will helps you.

回答7:

DELETE FROM timesheet 
WHERE timesheetRecordId <>(
                SELECT TOP 1 timesheetRecordId from timesheet  
                WHERE checkInDate=todaysDate AND employeeId=empId ORDER BY checkInTime ASC
               ) 
AND checkInDate=today's date AND empolyeeId=empId;

回答8:

I don't think you can specify a Target Table, from a Delete statement, in a Subquery of that same statement. So you can't do it with one single Delete statement.

What you can do is write a stored procedure. In your Stored Procedure you should create a Temporary Table containing the PKs returned by this Query:

select cht.pkey 
  from CheckTimeTable as cht
  where exists ( select pkey
                   from CheckTimeTable 
                   where username = cht.userName
                     and checkType = 'check-IN'
                     and Checktime >= subtime(cht.Checktime, '0 0:0:15.000000') 
                     and Checktime < cht.Checktime);

Then write another statement to delete those PKs from your original table, CheckTimeTable.

Note that the query above is for MySQL, so you'll need to find the way to subtract 15 seconds from a timestamp for your DBMS. In MySQL it's done like this:

subtime(cht.Checktime, '0 0:0:15.000000')

This query will return whichever CheckTime record that has another CheckTime record from the same user, with the type Check-In, and within 15 seconds earlier than its own checktime.

来源：https://stackoverflow.com/questions/11839893/deleting-invalid-duplicate-rows-in-sql

标签

ASP.NET

sql

sql-server-2005