Flattening intersecting timespans

前端未结

关注

 7  1424

I have lots of data with start and stop times for a given ID and I need to flatten all intersecting and adjacent timespans into one combined timespan. The sample data posted

相关标签:

7条回答

一生所求

2020-12-14 20:17

Here is a SQL only solution. I used DATETIME for the columns. Storing the time separate is a mistake in my opinion, as you will have problems when the times go past midnight. You can adjust this to handle that situation though if you need to. The solution also assumes that the start and end times are NOT NULL. Again, you can adjust as needed if that's not the case.

The general gist of the solution is to get all of the start times that don't overlap with any other spans, get all of the end times that don't overlap with any spans, then match the two together.

The results match your expected results except in one case, which checking by hand looks like you have a mistake in your expected output. On the 6th there should be a span that ends at 2009-06-06 10:18:45.000.

SELECT
     ST.start_time,
     ET.end_time
FROM
(
     SELECT
          T1.start_time
     FROM
          dbo.Test_Time_Spans T1
     LEFT OUTER JOIN dbo.Test_Time_Spans T2 ON
          T2.start_time < T1.start_time AND
          T2.end_time >= T1.start_time
     WHERE
          T2.start_time IS NULL
) AS ST
INNER JOIN
(
     SELECT
          T3.end_time
     FROM
          dbo.Test_Time_Spans T3
     LEFT OUTER JOIN dbo.Test_Time_Spans T4 ON
          T4.end_time > T3.end_time AND
          T4.start_time <= T3.end_time
     WHERE
          T4.start_time IS NULL
) AS ET ON
     ET.end_time > ST.start_time
LEFT OUTER JOIN
(
     SELECT
          T5.end_time
     FROM
          dbo.Test_Time_Spans T5
     LEFT OUTER JOIN dbo.Test_Time_Spans T6 ON
          T6.end_time > T5.end_time AND
          T6.start_time <= T5.end_time
     WHERE
          T6.start_time IS NULL
) AS ET2 ON
     ET2.end_time > ST.start_time AND
     ET2.end_time < ET.end_time
WHERE
     ET2.end_time IS NULL

0 讨论(0)

灰色年华

2020-12-14 20:22

In MySQL:

SELECT  grouper, MIN(start) AS group_start, MAX(end) AS group_end
FROM    (
        SELECT  start,
                end,
                @r := @r + (@edate < start) AS grouper,
                @edate := GREATEST(end, CAST(@edate AS DATETIME))
        FROM    (
                SELECT  @r := 0,
                        @edate := CAST('0000-01-01' AS DATETIME)
                ) vars,
                (
                SELECT  rn_date + INTERVAL TIME_TO_SEC(rn_start) SECOND AS start,
                        rn_date + INTERVAL TIME_TO_SEC(rn_end) SECOND + INTERVAL (rn_start > rn_end) DAY AS end
                FROM    t_ranges
                ) q
        ORDER BY
                start
        ) q
GROUP BY
        grouper
ORDER BY
        group_start

Same decision for SQL Server is described in the following article in my blog:

Flattening timespans: SQL Server

Here's the function to do this:

DROP FUNCTION fn_spans
GO
CREATE FUNCTION fn_spans(@p_from DATETIME, @p_till DATETIME)
RETURNS @t TABLE
        (
        q_start DATETIME NOT NULL,
        q_end DATETIME NOT NULL
        )
AS
BEGIN
        DECLARE @qs DATETIME
        DECLARE @qe DATETIME
        DECLARE @ms DATETIME
        DECLARE @me DATETIME
        DECLARE cr_span CURSOR FAST_FORWARD
        FOR
        SELECT  s_date + s_start AS q_start,
                s_date + s_stop + CASE WHEN s_start < s_stop THEN 0 ELSE 1 END AS q_end
        FROM    t_span
        WHERE   s_date BETWEEN @p_from - 1 AND @p_till
                AND s_date + s_start >= @p_from
                AND s_date + s_stop <= @p_till
        ORDER BY
                q_start
        OPEN    cr_span
        FETCH   NEXT
        FROM    cr_span
        INTO    @qs, @qe
        SET @ms = @qs
        SET @me = @qe
        WHILE @@FETCH_STATUS = 0
        BEGIN
                FETCH   NEXT
                FROM    cr_span
                INTO    @qs, @qe
                IF @qs > @me
                BEGIN
                        INSERT
                        INTO    @t
                        VALUES (@ms, @me)
                        SET @ms = @qs
                END
                SET @me = CASE WHEN @qe > @me THEN @qe ELSE @me END
        END
        IF @ms IS NOT NULL 
        BEGIN
                INSERT
                INTO    @t
                VALUES  (@ms, @me)
        END
        CLOSE   cr_span
        RETURN
END

Since SQL Server lacks an easy way to refer to previously selected rows in a resultset, this is one of rare cases when cursors in SQL Server work faster than set-based decisions.

Tested on 1,440,000 rows, works for 24 seconds for the full set, and almost instant for a range of day or two.

Note the additional condition in the SELECT query:

s_date BETWEEN @p_from - 1 AND @p_till

This seems to be redundant, but it is actually a coarse filter to make your index on s_date usable.

0 讨论(0)

滥情空心

2020-12-14 20:31

Assuming you:

have some sort of simple custom Date object that stores a start date/time and end date/time
get the rows back in sorted order (by start date/time) as a list, L, of these Dates
want to create a flattened list of Dates, F

Do the following:

first = first row in L
flat_date.start = first.start, flat_date.end = first.end
For each row in L:
    if row.start < flat_date.end and row.end > flat_date.end: // adding on to a timespan
        flat_date.end = row.end
    else: // ending a timespan and starting a new one
        add flat_date to F
        flat_date.start = row.start, flat_date.end = row.end
add flat_date to F // adding the last timespan to the flattened list

0 讨论(0)

梦如初夏

2020-12-14 20:35

Extending on MahlerFive answer I wrote a swift extension to DateTools. So far it has passed all my tests.

extension DTTimePeriodCollection {

    func flatten() {

        self.sortByStartAscending()

        guard let periods = self.periods() else { return }
        if periods.count < 1 { return }

        var flattenedPeriods = [DTTimePeriod]()
        let flatdate = DTTimePeriod()

        for period in periods {

            guard let periodStart = period.StartDate, let periodEnd = period.EndDate else { continue }

            if !flatdate.hasStartDate() { flatdate.StartDate = periodStart }
            if !flatdate.hasEndDate() { flatdate.EndDate = periodEnd }

            if periodStart.isEarlierThanOrEqualTo(flatdate.EndDate) && periodEnd.isGreaterThanOrEqualTo(flatdate.EndDate) {

                flatdate.EndDate = periodEnd

            } else {

                flattenedPeriods.append(flatdate.copy())
                flatdate.StartDate = periodStart
                flatdate.EndDate = periodEnd
            }
        }

        flattenedPeriods.append(flatdate.copy())

        // delete all periods
        for var i = 0 ; i < periods.count ; i++ { self.removeTimePeriodAtIndex(0) }

        // add flattened periods to self
        for flat in flattenedPeriods { self.addTimePeriod(flat) }
    }

0 讨论(0)

盖世英雄少女心

2020-12-14 20:43

Here is a recursive CTE solution, but I took the liberty of assigning a date and time to each column rather than pulling the date out separately. Helps to avoid some messy special case code. If you must store the date separately, I would use a view of CTE to make it look like two datetime columns and go with this approach.

create test data:

create table t1 (d1 datetime, d2 datetime)

insert t1 (d1,d2)
    select           '2009-06-03 10:00:00', '2009-06-03 14:00:00'
    union all select '2009-06-03 13:55:00', '2009-06-03 18:00:00'
    union all select '2009-06-03 17:55:00', '2009-06-03 23:00:00'
    union all select '2009-06-03 22:55:00', '2009-06-04 03:00:00'

    union all select '2009-06-04 03:05:00', '2009-06-04 07:00:00'

    union all select '2009-06-04 07:05:00', '2009-06-04 10:00:00'
    union all select '2009-06-04 09:55:00', '2009-06-04 14:00:00'

Recursive CTE:

;with dateRanges (ancestorD1, parentD1, d2, iter) as
(
--anchor is first level of collapse
    select
        d1 as ancestorD1,
        d1 as parentD1,
        d2,
        cast(0 as int) as iter
    from t1

--recurse as long as there is another range to fold in
    union all select
        tLeft.ancestorD1,
        tRight.d1 as parentD1,
        tRight.d2,
        iter + 1  as iter
    from dateRanges as tLeft join t1 as tRight
        --join condition is that the t1 row can be consumed by the recursive row
        on tLeft.d2 between tRight.d1 and tRight.d2
            --exclude identical rows
            and not (tLeft.parentD1 = tRight.d1 and tLeft.d2 = tRight.d2)
)
select
    ranges1.*
from dateRanges as ranges1
where not exists (
    select 1
    from dateRanges as ranges2
    where ranges1.ancestorD1 between ranges2.ancestorD1 and ranges2.d2
        and ranges1.d2 between ranges2.ancestorD1 and ranges2.d2
        and ranges2.iter > ranges1.iter
)

Gives output:

ancestorD1              parentD1                d2                      iter
----------------------- ----------------------- ----------------------- -----------
2009-06-04 03:05:00.000 2009-06-04 03:05:00.000 2009-06-04 07:00:00.000 0
2009-06-04 07:05:00.000 2009-06-04 09:55:00.000 2009-06-04 14:00:00.000 1
2009-06-03 10:00:00.000 2009-06-03 22:55:00.000 2009-06-04 03:00:00.000 3

0 讨论(0)

梦如初夏

2020-12-14 20:44

Similar question on SO here:

Min effective and termdate for contiguous dates

FWIW I up-voted the one that recommended Joe Celko's SQL For Smarties, Third Edition -- repeat: Third Edition (2005) -- which discusses various approaches, set base and procedural.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页