Applying LAG() to multiple rows with a null value

问题

Given:

with    
m as (
    select  1 ID, cast('03/01/2015' as datetime) PERIOD_START, cast('3/31/2015' as datetime) PERIOD_END
    union all
    select  1 ID, '04/01/2015', '4/28/2015'
    union all
    select  1 ID, '05/01/2015', '5/31/2015'
    union all
    select  1 ID, '06/01/2015', '06/30/2015'
    union all
    select  1 ID, '07/01/2015', '07/31/2015'
)

,
a as (

    SELECT  1 ID, cast('2015-03-13 14:17:00.000' as datetime) AUDIT_TIME, 'READ [2]' STATUS
    UNION ALL
    SELECT  1 ID, '2015-04-27 15:51:00.000' AUDIT_TIME, 'HELD [2]' STATUS
    UNION ALL
    SELECT  1 ID, '2015-07-08 17:54:00.000' AUDIT_TIME, 'COMPLETED [5]' STATUS
)

This query:

select  m.ID,PERIOD_START,PERIOD_END
        ,a.AUDIT_TIME,STATUS
from    m
LEFT OUTER JOIN a on m.id=a.id 
    and a.audit_time between m.period_start and m.period_end

generates this record set:

ID  PERIOD_START    PERIOD_END  AUDIT_TIME  STATUS
1   2015-03-01 00:00:00.000 2015-03-31 00:00:00.000 2015-03-13 14:17:00.000 READ [2]
1   2015-04-01 00:00:00.000 2015-04-28 00:00:00.000 2015-04-27 15:51:00.000 HELD [2]
1   2015-05-01 00:00:00.000 2015-05-31 00:00:00.000 NULL    NULL
1   2015-06-01 00:00:00.000 2015-06-30 00:00:00.000 NULL    NULL
1   2015-07-01 00:00:00.000 2015-07-31 00:00:00.000 2015-07-08 17:54:00.000 COMPLETED [5]

I need the 4/27/15 entry repeated for May and June:

ID  PERIOD_START    PERIOD_END  AUDIT_TIME  STATUS
1   2015-03-01 00:00:00.000 2015-03-31 00:00:00.000 2015-03-13 14:17:00.000 READ [2]
1   2015-04-01 00:00:00.000 2015-04-28 00:00:00.000 2015-04-27 15:51:00.000 HELD [2]
1   2015-05-01 00:00:00.000 2015-05-31 00:00:00.000 2015-04-27 15:51:00.000 HELD [2]
1   2015-06-01 00:00:00.000 2015-06-30 00:00:00.000 2015-04-27 15:51:00.000 HELD [2]
1   2015-07-01 00:00:00.000 2015-07-31 00:00:00.000 2015-07-08 17:54:00.000 COMPLETED [5]

Using the LAG() function:

select  m.ID,PERIOD_START,PERIOD_END
        ,a.AUDIT_TIME
        ,LAG(audit_time) OVER (partition by m.ID order by period_start) PRIOR_AUDIT_TIME
        ,STATUS
        ,LAG(STATUS) OVER (partition by m.ID order by period_start) PRIOR_STATUS
from    m
LEFT OUTER JOIN a on m.id=a.id 
    and a.audit_time between m.period_start and m.period_end

only works for a single row:

ID  PERIOD_START    PERIOD_END  AUDIT_TIME  PRIOR_AUDIT_TIME    STATUS  PRIOR_STATUS
1   2015-03-01 00:00:00.000 2015-03-31 00:00:00.000 2015-03-13 14:17:00.000 NULL    READ [2]    NULL
1   2015-04-01 00:00:00.000 2015-04-28 00:00:00.000 2015-04-27 15:51:00.000 2015-03-13 14:17:00.000 HELD [2]    READ [2]
1   2015-05-01 00:00:00.000 2015-05-31 00:00:00.000 NULL    2015-04-27 15:51:00.000 NULL    HELD [2]
1   2015-06-01 00:00:00.000 2015-06-30 00:00:00.000 NULL    NULL    NULL    NULL
1   2015-07-01 00:00:00.000 2015-07-31 00:00:00.000 2015-07-08 17:54:00.000 NULL    COMPLETED [5]   NULL

Is there a way to do this without having to resort to a cursor?

回答1:

You can do this with window functions:

with q as (
      select m.ID, PERIOD_START, PERIOD_END, a.AUDIT_TIME, STATUS
      from m LEFT OUTER JOIN
           a
           on m.id = a.id and
              a.audit_time between m.period_start and m.period_end
     )
select q.*,
       max(status) over (partition by id, audit_grp) as imputed_status
from (select q.*,
             max(audit_time) over (partition by id order by period_start) as audit_grp
      from q
     ) q

The idea is to copy the audit_time value over, using max() as a cumulative window function. This then defines groups, so you can get the status as well.

ANSI supplies the IGNORE NULLSs directive to LAG(), but SQL Server does not (yet) support it.

来源：https://stackoverflow.com/questions/31440623/applying-lag-to-multiple-rows-with-a-null-value

标签

sql

sql-server

sql-server-2012