Getting distinct rows for overlapping timestamp - Sql Server

问题

I have the following Source table where in there are records with start and end timestamps of a person logging in and logging out.

employeeNumber |    start_time           |  end_time
john           |   10/02/2020 16.30.000  |  11/02/2020 02.00.000
john           |   10/02/2020 20.00.000  |  10/02/2020 22.00.000
john           |   10/02/2020 23.00.000  |  11/02/2020 01.00.000
rick           |   10/02/2020 10.00.000  |  10/02/2020 11.00.000
rick           |   10/02/2020 13.00.000  |  10/02/2020 14.30.000
tom            |   10/02/2020 09:00.000  |  10/02/2020 18.00.000

As you can see john has 3 overlapping records, rick has 2 non-overlapping record and tom has only 1 record.

Hence, i would want the result to look as following :

john   |   10/02/2020 16.30.000  |  11/02/2020 02.00.000
rick   |   10/02/2020 10.00.000  |  10/02/2020 11.00.000
rick   |   10/02/2020 13.00.000  |  10/02/2020 14.30.000
tom    |   10/02/2020 09:00.000  |  10/02/2020 18.00.000

So with some R&D and lot of help from @Gordon Linoff, the following sql was helpful in getting me close to my result.

with e as (
select t1.*,s.final_inc from
(
  select e.employeeNumber, v.dt, sum(v.inc) as inc
      from emp_data e cross apply
           (values (start_time, 1),
                   (end_time, -1)
           ) v(dt, inc)
      group by e.employeeNumber, v.dt) t1 

   outer apply

   ( select sum(t2.inc) as final_inc from 

   (select e.employeeNumber,v.dt,sum(v.inc) as inc
      from emp_data e cross apply 
      (values (start_time, 1),
                   (end_time, -1)
           ) v(dt, inc)
           group by e.employeeNumber, v.dt ) t2 
     where t2.employeeNumber = t1.employeeNumber and
           t2.dt<=t1.dt)s
     )

select employeeNumber, min(dt) as start_datetime, max(dt) as end_datetime
from (select e.*,
             (select sum(case when e2.final_inc = 0 then 1 else 0 end) 
              from e e2
              where e2.employeeNumber = e.employeeNumber and
                    e2.dt <= e.dt
             ) as grp
      from e
     ) e
where final_inc <> 0
group by employeeNumber, grp;

Here is the DB fiddle having the query that i used to get the results up until now. In the fiddle, the second query is as suggested by @Gordon, However, since the compatibility level set for my SQL Server is 100, it does not support the use of order by along side sum() over. Hence i used outer apply for the same in my next query.

The above query now, gives me the following output:

john   |   10/02/2020 16.30.000  |  11/02/2020 01.00.000
rick   |   10/02/2020 10.00.000  |  10/02/2020 10.00.000
tom    |   10/02/2020 09.00.000  |  10/02/2020 09.00.000
rick   |   10/02/2020 13:00.000  |  10/02/2020 13.00.000

So, here i am faced with 2 issues.

For the 2 rows against rick and 1 against tom , the result is giving only the start_time in both the start_time and end_time column.
For John, Although it picked only one record with start time as 10/02/2020 16.30.000 , which is correct, but the end time that it picked up is 11/02/2020 01.00.000. However, the one that should be picked is 11/02/2020 02.00.000.

Any help is appreciated.

来源：https://stackoverflow.com/questions/60869021/getting-distinct-rows-for-overlapping-timestamp-sql-server

标签

sql

sql-server

database

select

logic