Tracking continuous days of absence from work days only SQL

问题

I'm trying to create a table which takes dates of when a employee is sick and create a new column to provide a "sickness ID", which will identify a unique instance of absence over several dates. I've managed to do this, however I now need to factor in a table which contains the working pattern of each employee, which will let me know if someone was due in work on a given day of the week.

This can be joined using the day_no column in both tables along with the employee_number.

I posted a this question earlier and had a great solution by @GMB, however I need this addition of the working hours.

I have table called sickness which looks like this

date_sick   day_no  day_name    employee_number hours_lost  working_hours   
2020-07-14  2       Tuesday     001             7.5         7.5             
2020-07-15  3       Wednesday   001             7.5         7.5             
2020-07-16  4       Thursday    001             7.5         7.5             
2020-07-17  5       Friday      001             7.5         7.5             
2020-07-21  2       Tuesday     001             7.5         7.5             
2020-07-22  3       Wednesday   001             7.5         7.5             
2020-07-23  4       Thursday    001             7.5         7.5             
2020-07-24  5       Friday      001             7.5         7.5             
2020-07-28  2       Tuesday     001             7.5         7.5             
2020-07-29  3       Wednesday   001             7.5         7.5             
2020-07-30  4       Thursday    001             7.5         7.5             
2020-07-31  5       Friday      001             7.5         7.5             
2020-09-09  3       Wednesday   001             7.5         7.5             
2020-09-10  4       Thursday    001             7.5         7.5             
2020-07-22  3       Wednesday   002             8           8               
2020-07-23  4       Thursday    002             8           8

And my working hours table looks like this:

employee_number day_no working_hours
001             1      0
001             2      7.5
001             3      7.5
001             4      7.5
001             5      7.5
001             6      0
001             7      0
002             1      8
002             2      8
002             3      8
002             4      8
002             5      8
002             6      0
002             7      0

Using the following statement, I'm able to apply a unique sickness ID which identifies a unique instance of employee absence over consecutive dates, which is unique to both the employee and the dates there were absence, given by:

IF OBJECT_ID('dbo.sickness ', 'u') IS NOT NULL DROP TABLE dbo.sickness 
CREATE TABLE dbo.sickness (date_sick date
                        , day_no int
                        , day_name varchar(10)
                        , employee_number char(5)
                        , hours_lost float
                        , working_hours float)
INSERT INTO dbo.sickness (date_sick, day_no, day_name, Employee_Number, Hours_Lost, Working_Hours)
VALUES 
('2020-07-14', '2', 'Tuesday', '001', '7.5', '7.5'),
('2020-07-15', '3', 'Wednesday', '001', '7.5', '7.5'),
('2020-07-16', '4', 'Thursday', '001', '7.5', '7.5'),
('2020-07-17', '5', 'Friday', '001', '7.5', '7.5'),
('2020-07-21', '2', 'Tuesday', '001', '7.5', '7.5'),
('2020-07-22', '3', 'Wednesday', '001', '7.5', '7.5'),
('2020-07-23', '4', 'Thursday', '001', '7.5', '7.5'),
('2020-07-24', '5', 'Friday', '001', '7.5', '7.5'),
('2020-07-28', '2', 'Tuesday', '001', '7.5', '7.5'),
('2020-07-29', '3', 'Wednesday', '001', '7.5', '7.5'),
('2020-07-30', '4', 'Thursday', '001', '7.5', '7.5'),
('2020-07-31', '5', 'Friday', '001', '7.5', '7.5'),
('2020-09-09', '3', 'Wednesday', '001', '7.5', '7.5'),
('2020-09-10', '4', 'Thursday', '001', '7.5', '7.5'),
('2020-07-22', '3', 'Wednesday', '002', '8', '8'),
('2020-07-23', '4', 'Thursday', '002', '8', '8')

GO

IF OBJECT_ID('dbo.working_hours ', 'u') IS NOT NULL DROP TABLE dbo.working_hours 
CREATE TABLE dbo.working_hours (employee_number char(5)
                            , day_no int
                            , working_hours float)

INSERT INTO dbo.working_hours (employee_number, day_no, working_hours)
VALUES 
('001', '1', '0'),
('001', '2', '7.5'),
('001', '3', '7.5'),
('001', '4', '7.5'),
('001', '5', '7.5'),
('001', '6', '0'),
('001', '7', '0'),
('002', '1', '8'),
('002', '2', '8'),
('002', '3', '8'),
('002', '4', '8'),
('002', '5', '8'),
('002', '6', '0'),
('002', '7', '0');


WITH CTE AS(
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY employee_number ORDER BY date_sick) AS rn
    FROM dbo.sickness s)

SELECT c.date_sick,
       c.day_no,
       c.day_name,
       c.employee_number,
       c.hours_lost,
       w.working_hours,
       DENSE_RANK() OVER (ORDER BY C.employee_number, DATEADD(DAY, -C.rn, C.date_sick)) AS sickness_id
FROM CTE C
    JOIN working_hours w
        ON  c.employee_number = w.employee_number
        AND c.day_no = w.day_no

ORDER BY C.employee_number,
         C.date_sick
DROP TABLE dbo.sickness
DROP TABLE dbo.working_hours

This outputs the following table:

date_sick   day_no  day_name    employee_number hours_lost  working_hours   sickness_id
2020-07-14  2       Tuesday     001             7.5         7.5             1
2020-07-15  3       Wednesday   001             7.5         7.5             1
2020-07-16  4       Thursday    001             7.5         7.5             1
2020-07-17  5       Friday      001             7.5         7.5             1
2020-07-21  2       Tuesday     001             7.5         7.5             2
2020-07-22  3       Wednesday   001             7.5         7.5             2
2020-07-23  4       Thursday    001             7.5         7.5             2
2020-07-24  5       Friday      001             7.5         7.5             2
2020-07-28  2       Tuesday     001             7.5         7.5             3
2020-07-29  3       Wednesday   001             7.5         7.5             3
2020-07-30  4       Thursday    001             7.5         7.5             3
2020-07-31  5       Friday      001             7.5         7.5             3
2020-09-09  3       Wednesday   001             7.5         7.5             4
2020-09-10  4       Thursday    001             7.5         7.5             4
2020-07-22  3       Wednesday   002             8           8               5
2020-07-23  4       Thursday    002             8           8               5

The issue with this is that it's grouping the consecutive days but only ones that are within the same week. The first 12 rows should all have the same sickness ID. What I want is the following table:

date_sick   day_no  day_name    employee_number hours_lost  working_hours   sickness_id
2020-07-14  2       Tuesday     001             7.5         7.5             1
2020-07-15  3       Wednesday   001             7.5         7.5             1
2020-07-16  4       Thursday    001             7.5         7.5             1
2020-07-17  5       Friday      001             7.5         7.5             1
2020-07-21  2       Tuesday     001             7.5         7.5             1
2020-07-22  3       Wednesday   001             7.5         7.5             1
2020-07-23  4       Thursday    001             7.5         7.5             1
2020-07-24  5       Friday      001             7.5         7.5             1
2020-07-28  2       Tuesday     001             7.5         7.5             1
2020-07-29  3       Wednesday   001             7.5         7.5             1
2020-07-30  4       Thursday    001             7.5         7.5             1
2020-07-31  5       Friday      001             7.5         7.5             1
2020-09-09  3       Wednesday   001             7.5         7.5             2
2020-09-10  4       Thursday    001             7.5         7.5             2
2020-07-22  3       Wednesday   002             8           8               3
2020-07-23  4       Thursday    002             8           8               3

Any ideas? Maybe connecting it to a calendar table?

回答1:

As I mention in the comment, just use a WHERE. This is, of course, a blind guess due to a lack of sample data (the sample has no working hours data):

--I prefer CTEs over subqueries
WITH CTE AS(
    SELECT s.date_sick,
           s.employee_number,
           ROW_NUMBER() OVER (PARTITION BY employee_number ORDER BY date_sick) AS rn
    FROM dbo.sickness s)
SELECT C.date_sick,
       C.employee_number,
       DENSE_RANK() OVER (ORDER BY C.employee_number, DATEADD(DAY, -C.rn, C.date_sick)) AS sickness_id,
       wh.workinghours
FROM CTE C
     JOIN dbo.workinghours wh ON C.employee_number = wh.employee_number
WHERE wh.working_hours > 0
ORDER BY C.employee_number,
         C.date_sick;

回答2:

I think that using lag() to see if the sickness days are consecutive and then a cumulative sum is a better approach for assigning the sickness id.

I am a little unclear on what you want exactly. But here is one approach:

select date_sick, employee_number,
       sum(case when working_hours > 0 and prev_working_hours > 0 and
                     dateadd(day, -1, date_sick) = prev_date_sick
                then 0 else 1
           end) over (partition by employee_number order by date_sick) as sickness_id
from (select s.*,
             lag(date_sick) over (partition by employee_number order by date_sick) as prev_date_sick,
             lag(working_hours) over (partition by employee_number order by date_sick) as prev_working_hours
      from sickness s left join
           working_hours wh
           on s.date_sick = wh.working_hours
     ) s
order by employee_number, date_sick

来源：https://stackoverflow.com/questions/65126701/tracking-continuous-days-of-absence-from-work-days-only-sql

标签

sql

sql-server

ssms

partitioning