Condense Time Periods with SQL

前端 未结 4 1442
春和景丽
春和景丽 2021-02-20 17:41

I have a large data set which for the purpose of this question has 3 fields:

  • Group Identifier
  • From Date
  • To Date

On any given row t

相关标签:
4条回答
  • 2021-02-20 18:19

    A Geometric Approach

    Here and elsewhere I've noticed that date packing questions don't provide a geometric approach to this problem. After all, any range, date-ranges included, can be interpreted as a line. So why not convert them to a sql geometry type and utilize geometry::UnionAggregate to merge the ranges. So I gave a stab at it with your post.

    Code Description

    In 'numbers':

    • I build a table representing a sequence
    • Swap it out with your favorite way to make a numbers table.
    • For a union operation, you won't ever need more rows than in your original table, so I just use it as the base to build it.

    In 'mergeLines':

    • I convert the dates to floats and use those floats to create geometrical points.
    • In this problem, we're working in 'integer space,' meaning there are no time considerations, and so an begin date in one range that is one day apart from an end date in another should be merged with that other. In order to make that merge happen, we need to convert to 'real space.', so we add 1 to the tail of all ranges (we undo this later).
    • I then connect these points via STUnion and STEnvelope.
    • Finally, I merge all these lines via UnionAggregate. The resulting 'lines' geometry object might contain multiple lines, but if they overlap, they turn into one line.

    In the outer query:

    • I use the numbers CTE to extract the individual lines inside 'lines'.
    • I envelope the lines which here ensures that the lines are stored only as its two endpoints.
    • I read the endpoint x values and convert them back to their time representations, ensuring to put them back into 'integer space'.

    The Code

    with
    
        numbers as (
    
            select  row_number() over (order by (select null)) i 
            from    @spans -- Where I put your data
    
        ),
    
        mergeLines as (
    
            select      groupId,
                        lines = geometry::UnionAggregate(line)
            from        @spans
            cross apply (select 
                            startP = geometry::Point(convert(float,fromDate), 0, 0),
                            stopP = geometry::Point(convert(float,toDate) + 1, 0, 0)
                        ) pointify
            cross apply (select line = startP.STUnion(stopP).STEnvelope()) lineify
            group by    groupId 
    
        )
    
        select      groupId, fromDate, toDate 
        from        mergeLines ml
        join        numbers n on n.i between 1 and ml.lines.STNumGeometries()
        cross apply (select line = ml.lines.STGeometryN(i).STEnvelope()) l
        cross apply (select 
                        fromDate = convert(datetime, l.line.STPointN(1).STX),
                        toDate = convert(datetime, l.line.STPointN(3).STX) - 1
                    ) unprepare
        order by    groupId, fromDate;
    
    0 讨论(0)
  • 2021-02-20 18:24

    I'd use a Calendar table. This table simply has a list of dates for several decades.

    CREATE TABLE [dbo].[Calendar](
        [dt] [date] NOT NULL,
    CONSTRAINT [PK_Calendar] PRIMARY KEY CLUSTERED 
    (
        [dt] ASC
    ))
    

    There are many ways to populate such table.

    For example, 100K rows (~270 years) from 1900-01-01:

    INSERT INTO dbo.Calendar (dt)
    SELECT TOP (100000) 
        DATEADD(day, ROW_NUMBER() OVER (ORDER BY s1.[object_id])-1, '19000101') AS dt
    FROM sys.all_objects AS s1 CROSS JOIN sys.all_objects AS s2
    OPTION (MAXDOP 1);
    

    Once you have a Calendar table, here is how to use it.

    Each original row is joined with the Calendar table to return as many rows as there are dates between From and To.

    Then possible duplicates are removed.

    Then classic gaps-and-islands by numbering the rows in two sequences.

    Then grouping found islands together to get the new From and To.

    Sample data

    I added a second group.

    DECLARE @T TABLE (GroupID int, FromDate date, ToDate date);
    INSERT INTO @T (GroupID, FromDate, ToDate) VALUES
    (1, '2012-01-01', '2012-12-31'),
    (1, '2013-12-01', '2014-11-30'),
    (1, '2015-01-01', '2015-12-31'),
    (1, '2015-01-01', '2015-12-31'),
    (1, '2015-02-01', '2015-03-31'),
    (1, '2013-01-01', '2013-12-31'),
    (2, '2012-01-01', '2012-12-31'),
    (2, '2013-01-01', '2013-12-31');
    

    Query

    WITH
    CTE_AllDates
    AS
    (
        SELECT DISTINCT
            T.GroupID
            ,CA.dt
        FROM
            @T AS T
            CROSS APPLY
            (
                SELECT dbo.Calendar.dt
                FROM dbo.Calendar
                WHERE
                    dbo.Calendar.dt >= T.FromDate
                    AND dbo.Calendar.dt <= T.ToDate
            ) AS CA
    )
    ,CTE_Sequences
    AS
    (
        SELECT
            GroupID
            ,dt
            ,ROW_NUMBER() OVER(PARTITION BY GroupID ORDER BY dt) AS Seq1
            ,DATEDIFF(day, '2001-01-01', dt) AS Seq2
            ,DATEDIFF(day, '2001-01-01', dt) - 
                ROW_NUMBER() OVER(PARTITION BY GroupID ORDER BY dt) AS IslandNumber
        FROM CTE_AllDates
    )
    SELECT
        GroupID
        ,MIN(dt) AS NewFromDate
        ,MAX(dt) AS NewToDate
    FROM CTE_Sequences
    GROUP BY GroupID, IslandNumber
    ORDER BY GroupID, NewFromDate;
    

    Result

    +---------+-------------+------------+
    | GroupID | NewFromDate | NewToDate  |
    +---------+-------------+------------+
    |       1 | 2012-01-01  | 2014-11-30 |
    |       1 | 2015-01-01  | 2015-12-31 |
    |       2 | 2012-01-01  | 2013-12-31 |
    +---------+-------------+------------+
    
    0 讨论(0)
  • 2021-02-20 18:31
    ; with 
    cte as
    (
        select  *, rn = row_number() over (partition by [Group ID] order by [From Date])
        from    tbl
    ),
    rcte as
    (
        select  rn, [Group ID], [From Date], [To Date], GrpNo = 1, GrpFrom = [From Date], GrpTo = [To Date]
        from    cte
        where   rn  = 1
    
        union all
    
        select  c.rn, c.[Group ID], c.[From Date], c.[To Date], 
            GrpNo = case    when    c.[From Date] between r.GrpFrom and dateadd(day, 1, r.GrpTo)
                    or  c.[To Date]   between r.GrpFrom and r.GrpTo
                    then    r.GrpNo
                    else    r.GrpNo + 1
                    end,
            GrpFrom= case   when    c.[From Date] between r.GrpFrom and dateadd(day, 1, r.GrpTo)
                    or  c.[To Date]   between r.GrpFrom and r.GrpTo
                    then    case when c.[From Date] > r.GrpFrom then c.[From Date] else r.GrpFrom end
                    else    c.[From Date] 
                    end,
            GrpTo  = case   when    c.[From Date] between r.GrpFrom and dateadd(day, 1, r.GrpTo)
                    or  c.[To Date]   between r.GrpFrom and dateadd(day, 1, r.GrpTo)
                    then    case when c.[To Date] > r.GrpTo then c.[To Date] else r.GrpTo end
                    else    c.[To Date]  
                    end
    
        from    rcte r
            inner join cte c    on  r.[Group ID]    = c.[Group ID]
                        and r.rn        = c.rn - 1
    )
    select  [Group ID], min(GrpFrom), max(GrpTo)
    from    rcte
    group by [Group ID], GrpNo
    
    0 讨论(0)
  • 2021-02-20 18:38

    The solution from book "Microsoft® SQL Server ® 2012 High-Performance T-SQL Using Window Functions"

    ;with C1 as(
    select GroupID, FromDate as ts, +1 as type, 1 as sub
      from dbo.table_name
    union all
    select GroupID, dateadd(day, +1, ToDate) as ts, -1 as type, 0 as sub
      from dbo.table_name),
    C2 as(
    select C1.*
         , sum(type) over(partition by GroupID order by ts, type desc
                          rows between unbounded preceding and current row) - sub as cnt
      from C1),
    C3 as(
    select GroupID, ts, floor((row_number() over(partition by GroupID order by ts) - 1) / 2 + 1) as grpnum
      from C2
      where cnt = 0)
    
    select GroupID, min(ts) as FromDate, dateadd(day, -1, max(ts)) as ToDate
      from C3
      group by GroupID, grpnum;
    

    Create table:

    if object_id('table_name') is not null
      drop table table_name
    create table table_name(GroupID varchar(100), FromDate datetime,ToDate datetime)
    insert into table_name
    select 'A', '01/01/2012', '12/31/2012' union all
    select 'A', '12/01/2013', '11/30/2014' union all
    select 'A', '01/01/2015', '12/31/2015' union all
    select 'A', '01/01/2015', '12/31/2015' union all
    select 'A', '02/01/2015', '03/31/2015' union all
    select 'A', '01/01/2013', '12/31/2013'
    
    0 讨论(0)
提交回复
热议问题