Merge overlapping date intervals

后端 未结 7 977
不知归路
不知归路 2020-11-27 16:16

Is there a better way of merging overlapping date intervals?
The solution I came up with is so simple that now I wonder if someone else has a better idea of how this cou

相关标签:
7条回答
  • 2020-11-27 17:05

    A Geometric Approach

    Here and elsewhere I've noticed that date packing questions don't provide a geometric approach to this problem. After all, any range, date-ranges included, can be interpreted as a line. So why not convert them to a sql geometry type and utilize geometry::UnionAggregate to merge the ranges.

    Why?

    This has the advantage of handling all types of overlaps, including fully nested ranges. It also works like any other aggregate query, so it's a little more intuitive in that respect. You also get the bonus of a visual representation of your results if you care to use it. Finally, it is the approach I use for simultaneous range packing (you work with rectangles instead of lines in that case, and there are many more considerations). I just couldn't get the existing approaches to work in that scenario.

    This has the disadvantage of requiring more recent versions of SQL Server. It also requires a numbers table and it's annoying to extract the individually produced lines from the aggregated shape. But hopefully in the future Microsoft adds a TVF that allows you to do this easily without a numbers table (or you can just build one yourself). Also, geometrical objects work with floats, so you have conversion annoyances and precision concerns to keep in mind.

    Performance-wise I don't know how it compares, but I've done a few things (not shown here) to make it work for me even with large datasets.

    Code Description

    In 'numbers':

    • I build a table representing a sequence
    • Swap it out with your favorite way to make a numbers table.
    • For a union operation, you won't ever need more rows than in your original table, so I just use it as the base to build it.

    In 'mergeLines':

    • I convert the dates to floats and use those floats to create geometrical points.
    • In this problem, we're working in 'integer space,' meaning there are no time considerations, and so an begin date in one range that is one day apart from an end date in another should be merged with that other. In order to make that merge happen, we need to convert to 'real space.', so we add 1 to the tail of all ranges (we undo this later).
    • I then connect these points via STUnion and STEnvelope.
    • Finally, I merge all these lines via UnionAggregate. The resulting 'lines' geometry object might contain multiple lines, but if they overlap, they turn into one line.

    In the outer query:

    • I use the numbers CTE to extract the individual lines inside 'lines'.
    • I envelope the lines which here ensures that the lines are stored only as its two endpoints.
    • I read the endpoint x values and convert them back to their time representations, ensuring to put them back into 'integer space'.

    The Code

    with 
    
        numbers as (
    
            select  row_number() over (order by (select null)) i 
            from    @t
    
        ),
    
        mergeLines as (
    
            select      lines = geometry::UnionAggregate(line)
            from        @t
            cross apply (select line = 
                            geometry::Point(convert(float, d1), 0, 0).STUnion(
                                geometry::Point(convert(float, d2) + 1, 0, 0)
                            ).STEnvelope()
                        ) l
    
        )
    
        select      ap.StartDate,
                    ap.EndDate
        from        mergeLines ml
        join        numbers n on n.i between 1 and ml.lines.STNumGeometries()
        cross apply (select line = ml.lines.STGeometryN(i).STEnvelope()) l
        cross apply (select 
                        StartDate = convert(datetime,l.line.STPointN(1).STX),
                        EndDate = convert(datetime,l.line.STPointN(3).STX) - 1
                    ) ap
        order by    ap.StartDate;
    
    0 讨论(0)
提交回复
热议问题