Getting distinct rows for overlapping timestamp in SQL Server

冷暖自知 提交于 2020-04-07 06:53:11

问题


I have the following result set which I get from SQL Server:

employeeNumber | start_date | start_time | end_date     | end_time
---------------+------------+------------+--------------+----------
123            | 10-03-2020 |  18:13:55  |  10-03-2020  | 22:59:46
123            | 10-03-2020 |  18:24:22  |  10-03-2020  | 22:59:51
123            | 10-03-2020 |  23:24:22  |  10-03-2020  | 23:59:51
123            | 11-03-2020 |  18:25:25  |  11-03-2020  | 20:59:51
123            | 12-03-2020 |  18:40:22  |  12-03-2020  | 22:59:52

For some cases I have multiple rows for the same overlapping time (row 1 and 2) as above but with a different start and end time (difference in seconds or minutes).

While my query is a simple select query that fetches the data from the source table, What can i add in the where clause to fetch distinct rows for such overlapping timestamp rows. i.e. for the above query i would want the result set to return the following :

employeeNumber | start_date | start_time | end_date     | end_time    
---------------+------------+------------+--------------+----------
123            | 10-03-2020 |  18:13:55  |  10-03-2020  | 22:59:46
123            | 10-03-2020 |  23:24:22  |  10-03-2020  | 23:59:51
123            | 11-03-2020 |  18:25:25  |  11-03-2020  | 20:59:51
123            | 12-03-2020 |  18:40:22  |  12-03-2020  | 22:59:52

Below is my query :

select 
    employeeNumber, start_date, start_time, end_date, end_time
from 
    emp_data
where 
    employeeNumber = 123
order by 
    employeeNumber;

I can probably do with fetching only the first record but what would the where clause be.

Any help is appreciated as I am not very familiar with SQL Server.


回答1:


This is complicated. You need to keep track of "starts" and "ends". I am going to assume that your columns are datetimes or something similar that can be combined into a single column:

with e as (
      select e.employeeNumber, v.dt, sum(v.inc) as inc,
             sum(sum(v.inc)) over (partition by e.employeeNumber order by v.dt) as in_outs
      from emp_data e cross apply
           (values (start_date + start_time, 1),
                   (end_date + end_time, -1)
           ) v(dt, inc)
      group by e.employeeNumber, v.dt
     )
select employeeNumber, min(dt) as start_datetime, max(dt) as end_datetime
from (select e.*,
             sum(case when in_outs = 0 then 1 else 0 end) over (partition by employeeNumber order by dt) as grp
      from e
     ) e
where in_outs <> 0
group by employeeNumber, grp;

Here is a db<>fiddle.

What is this doing?

  • First the date/times are converted to date times.
  • Then the columns are unpivoted and identified as starts and ends, along with +1 or -1 to indicate whether the employee is "entering" or "existing" at that time.
  • These are accumulated.
  • Now you have a gaps and islands problem, where you want to find continue periods of "in"s. The "islands" are identified using a cumulative sum of "ins".
  • Then these are aggregated.

EDIT:

You can replace the cumulative sum with:

from (select e.*,
             (select sum(case when e2.in_outs = 0 then 1 else 0 end) 
              from e e2
              where e2.employeeNumber = e.employeeNumber
                    e2.dt <= e.dt
             ) as grp
      from e
     ) e


来源:https://stackoverflow.com/questions/60789087/getting-distinct-rows-for-overlapping-timestamp-in-sql-server

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!