Flattening intersecting timespans

前端 未结 7 1424
温柔的废话
温柔的废话 2020-12-14 19:56

I have lots of data with start and stop times for a given ID and I need to flatten all intersecting and adjacent timespans into one combined timespan. The sample data posted

相关标签:
7条回答
  • 2020-12-14 20:17

    Here is a SQL only solution. I used DATETIME for the columns. Storing the time separate is a mistake in my opinion, as you will have problems when the times go past midnight. You can adjust this to handle that situation though if you need to. The solution also assumes that the start and end times are NOT NULL. Again, you can adjust as needed if that's not the case.

    The general gist of the solution is to get all of the start times that don't overlap with any other spans, get all of the end times that don't overlap with any spans, then match the two together.

    The results match your expected results except in one case, which checking by hand looks like you have a mistake in your expected output. On the 6th there should be a span that ends at 2009-06-06 10:18:45.000.

    SELECT
         ST.start_time,
         ET.end_time
    FROM
    (
         SELECT
              T1.start_time
         FROM
              dbo.Test_Time_Spans T1
         LEFT OUTER JOIN dbo.Test_Time_Spans T2 ON
              T2.start_time < T1.start_time AND
              T2.end_time >= T1.start_time
         WHERE
              T2.start_time IS NULL
    ) AS ST
    INNER JOIN
    (
         SELECT
              T3.end_time
         FROM
              dbo.Test_Time_Spans T3
         LEFT OUTER JOIN dbo.Test_Time_Spans T4 ON
              T4.end_time > T3.end_time AND
              T4.start_time <= T3.end_time
         WHERE
              T4.start_time IS NULL
    ) AS ET ON
         ET.end_time > ST.start_time
    LEFT OUTER JOIN
    (
         SELECT
              T5.end_time
         FROM
              dbo.Test_Time_Spans T5
         LEFT OUTER JOIN dbo.Test_Time_Spans T6 ON
              T6.end_time > T5.end_time AND
              T6.start_time <= T5.end_time
         WHERE
              T6.start_time IS NULL
    ) AS ET2 ON
         ET2.end_time > ST.start_time AND
         ET2.end_time < ET.end_time
    WHERE
         ET2.end_time IS NULL
    
    0 讨论(0)
  • 2020-12-14 20:22

    In MySQL:

    SELECT  grouper, MIN(start) AS group_start, MAX(end) AS group_end
    FROM    (
            SELECT  start,
                    end,
                    @r := @r + (@edate < start) AS grouper,
                    @edate := GREATEST(end, CAST(@edate AS DATETIME))
            FROM    (
                    SELECT  @r := 0,
                            @edate := CAST('0000-01-01' AS DATETIME)
                    ) vars,
                    (
                    SELECT  rn_date + INTERVAL TIME_TO_SEC(rn_start) SECOND AS start,
                            rn_date + INTERVAL TIME_TO_SEC(rn_end) SECOND + INTERVAL (rn_start > rn_end) DAY AS end
                    FROM    t_ranges
                    ) q
            ORDER BY
                    start
            ) q
    GROUP BY
            grouper
    ORDER BY
            group_start
    

    Same decision for SQL Server is described in the following article in my blog:

    • Flattening timespans: SQL Server

    Here's the function to do this:

    DROP FUNCTION fn_spans
    GO
    CREATE FUNCTION fn_spans(@p_from DATETIME, @p_till DATETIME)
    RETURNS @t TABLE
            (
            q_start DATETIME NOT NULL,
            q_end DATETIME NOT NULL
            )
    AS
    BEGIN
            DECLARE @qs DATETIME
            DECLARE @qe DATETIME
            DECLARE @ms DATETIME
            DECLARE @me DATETIME
            DECLARE cr_span CURSOR FAST_FORWARD
            FOR
            SELECT  s_date + s_start AS q_start,
                    s_date + s_stop + CASE WHEN s_start < s_stop THEN 0 ELSE 1 END AS q_end
            FROM    t_span
            WHERE   s_date BETWEEN @p_from - 1 AND @p_till
                    AND s_date + s_start >= @p_from
                    AND s_date + s_stop <= @p_till
            ORDER BY
                    q_start
            OPEN    cr_span
            FETCH   NEXT
            FROM    cr_span
            INTO    @qs, @qe
            SET @ms = @qs
            SET @me = @qe
            WHILE @@FETCH_STATUS = 0
            BEGIN
                    FETCH   NEXT
                    FROM    cr_span
                    INTO    @qs, @qe
                    IF @qs > @me
                    BEGIN
                            INSERT
                            INTO    @t
                            VALUES (@ms, @me)
                            SET @ms = @qs
                    END
                    SET @me = CASE WHEN @qe > @me THEN @qe ELSE @me END
            END
            IF @ms IS NOT NULL 
            BEGIN
                    INSERT
                    INTO    @t
                    VALUES  (@ms, @me)
            END
            CLOSE   cr_span
            RETURN
    END
    

    Since SQL Server lacks an easy way to refer to previously selected rows in a resultset, this is one of rare cases when cursors in SQL Server work faster than set-based decisions.

    Tested on 1,440,000 rows, works for 24 seconds for the full set, and almost instant for a range of day or two.

    Note the additional condition in the SELECT query:

    s_date BETWEEN @p_from - 1 AND @p_till
    

    This seems to be redundant, but it is actually a coarse filter to make your index on s_date usable.

    0 讨论(0)
  • 2020-12-14 20:31

    Assuming you:

    • have some sort of simple custom Date object that stores a start date/time and end date/time
    • get the rows back in sorted order (by start date/time) as a list, L, of these Dates
    • want to create a flattened list of Dates, F

    Do the following:

    first = first row in L
    flat_date.start = first.start, flat_date.end = first.end
    For each row in L:
        if row.start < flat_date.end and row.end > flat_date.end: // adding on to a timespan
            flat_date.end = row.end
        else: // ending a timespan and starting a new one
            add flat_date to F
            flat_date.start = row.start, flat_date.end = row.end
    add flat_date to F // adding the last timespan to the flattened list
    
    0 讨论(0)
  • 2020-12-14 20:35

    Extending on MahlerFive answer I wrote a swift extension to DateTools. So far it has passed all my tests.

    extension DTTimePeriodCollection {
    
        func flatten() {
    
            self.sortByStartAscending()
    
            guard let periods = self.periods() else { return }
            if periods.count < 1 { return }
    
            var flattenedPeriods = [DTTimePeriod]()
            let flatdate = DTTimePeriod()
    
            for period in periods {
    
                guard let periodStart = period.StartDate, let periodEnd = period.EndDate else { continue }
    
                if !flatdate.hasStartDate() { flatdate.StartDate = periodStart }
                if !flatdate.hasEndDate() { flatdate.EndDate = periodEnd }
    
                if periodStart.isEarlierThanOrEqualTo(flatdate.EndDate) && periodEnd.isGreaterThanOrEqualTo(flatdate.EndDate) {
    
                    flatdate.EndDate = periodEnd
    
                } else {
    
                    flattenedPeriods.append(flatdate.copy())
                    flatdate.StartDate = periodStart
                    flatdate.EndDate = periodEnd
                }
            }
    
            flattenedPeriods.append(flatdate.copy())
    
            // delete all periods
            for var i = 0 ; i < periods.count ; i++ { self.removeTimePeriodAtIndex(0) }
    
            // add flattened periods to self
            for flat in flattenedPeriods { self.addTimePeriod(flat) }
        }
    
    0 讨论(0)
  • 2020-12-14 20:43

    Here is a recursive CTE solution, but I took the liberty of assigning a date and time to each column rather than pulling the date out separately. Helps to avoid some messy special case code. If you must store the date separately, I would use a view of CTE to make it look like two datetime columns and go with this approach.

    create test data:

    create table t1 (d1 datetime, d2 datetime)
    
    insert t1 (d1,d2)
        select           '2009-06-03 10:00:00', '2009-06-03 14:00:00'
        union all select '2009-06-03 13:55:00', '2009-06-03 18:00:00'
        union all select '2009-06-03 17:55:00', '2009-06-03 23:00:00'
        union all select '2009-06-03 22:55:00', '2009-06-04 03:00:00'
    
        union all select '2009-06-04 03:05:00', '2009-06-04 07:00:00'
    
        union all select '2009-06-04 07:05:00', '2009-06-04 10:00:00'
        union all select '2009-06-04 09:55:00', '2009-06-04 14:00:00'
    

    Recursive CTE:

    ;with dateRanges (ancestorD1, parentD1, d2, iter) as
    (
    --anchor is first level of collapse
        select
            d1 as ancestorD1,
            d1 as parentD1,
            d2,
            cast(0 as int) as iter
        from t1
    
    --recurse as long as there is another range to fold in
        union all select
            tLeft.ancestorD1,
            tRight.d1 as parentD1,
            tRight.d2,
            iter + 1  as iter
        from dateRanges as tLeft join t1 as tRight
            --join condition is that the t1 row can be consumed by the recursive row
            on tLeft.d2 between tRight.d1 and tRight.d2
                --exclude identical rows
                and not (tLeft.parentD1 = tRight.d1 and tLeft.d2 = tRight.d2)
    )
    select
        ranges1.*
    from dateRanges as ranges1
    where not exists (
        select 1
        from dateRanges as ranges2
        where ranges1.ancestorD1 between ranges2.ancestorD1 and ranges2.d2
            and ranges1.d2 between ranges2.ancestorD1 and ranges2.d2
            and ranges2.iter > ranges1.iter
    )
    

    Gives output:

    ancestorD1              parentD1                d2                      iter
    ----------------------- ----------------------- ----------------------- -----------
    2009-06-04 03:05:00.000 2009-06-04 03:05:00.000 2009-06-04 07:00:00.000 0
    2009-06-04 07:05:00.000 2009-06-04 09:55:00.000 2009-06-04 14:00:00.000 1
    2009-06-03 10:00:00.000 2009-06-03 22:55:00.000 2009-06-04 03:00:00.000 3
    
    0 讨论(0)
  • 2020-12-14 20:44

    Similar question on SO here:

    Min effective and termdate for contiguous dates

    FWIW I up-voted the one that recommended Joe Celko's SQL For Smarties, Third Edition -- repeat: Third Edition (2005) -- which discusses various approaches, set base and procedural.

    0 讨论(0)
提交回复
热议问题