Combine rows when the end time of one is the start time of another (Oracle)

强颜欢笑 提交于 2019-12-01 18:49:44
MatBailie

Maybe this? (I don't have a SQL machine to run it on)

WITH
  sequenced_data AS
(
  SELECT
    ROW_NUMBER() OVER (PARTITION BY name                ORDER BY start_inst) NameSequenceID,
    ROW_NUMBER() OVER (PARTITION BY name, code, subcode ORDER BY start_inst) NameStateSequenceID,
    *
  FROM
    data
)
SELECT
  name,
  MIN(start_inst) start_inst,
  MAX(end_inst)   end_inst,
  code,
  subcode
FROM
  sequenced_data
GROUP BY
  name,
  code,
  subcode,
  NameSequenceID - NameStateSequenceID

Here is a solution using a recursive query instead of analytic functions (as suggested by @wildplasser):

SELECT   name, code, subcode, MIN(start_inst) AS start_inst, MAX(end_inst) AS end_inst
FROM     (SELECT     name,
                     start_inst,
                     end_inst,
                     code,
                     subcode,
                     MIN(CONNECT_BY_ROOT (start_inst)) AS root_start
          FROM       data d
          CONNECT BY PRIOR name = name 
                 AND PRIOR end_inst = start_inst 
                 AND PRIOR code = code 
                 AND PRIOR subcode = subcode
          GROUP BY   name, start_inst, end_inst, code, subcode)
GROUP BY name, code, subcode, root_start;

The connect by clause in the innermost query causes the data to be returned in a hierarchical fashion. connect_by_root gives us the value at the root of each branch. Because we don't have a good candidate for a start with clause, we'll get all child rows (where end_inst equals another row's start_inst and all other columns are the same) multiple times: once as a root and once (or more) as a branch. Taking the min of the root eliminates these extra rows while giving us a value to group on in the outer query.

In the outer query, we perform another group by to consolidate the rows. The difference is that, in this case, we have root_start there as well to identify which rows are consecutive and therefore need to be consolidated.

Here's another approach:

SELECT
    name,
    min(start_inst) AS start_inst,
    max(end_inst) AS end_inst,
    code,
    subcode
FROM
    (
        SELECT
            A.*,
            COUNT
            (
                CASE WHEN start_inst = previous_end_inst THEN NULL
                ELSE 1
                END
            )
            OVER
            (
                ORDER BY
                    start_inst,
                    name,
                    code,
                    subcode
            ) AS group_number
        FROM
            (
                SELECT
                    name,
                    start_inst,
                    end_inst,
                    LAG
                    (
                      end_inst
                    )
                    OVER
                    (
                        PARTITION BY
                            name,
                            code,
                            subcode
                        ORDER BY
                            start_inst
                    ) AS previous_end_inst,
                    code,
                    subcode
                FROM
                    data
            ) A
        ) B
GROUP BY
    name,
    code,
    subcode,
    group_number
ORDER BY
    group_number

Basically:

  1. For each row, subquery A finds the previous end time for the given name, code, and subcode.

  2. For each row, subquery B calculates the "group number" -- a running count of preceeding rows (in order of start_inst, name, code, and subcode) where the previous end time calculated in Step 1 is not equal to the start time.

  3. The outer query aggregates by group number.

For better or worse, this approach, unlike @stevo's, will create a new "group" if there's a "gap" between the end time of one record and the start time of the next. For example, if you were to create a gap between 12:57 and 13:00 like this...

UPDATE data
SET start_inst = TO_DATE('9/12/2011 13:00', 'MM/DD/YYYY HH24:MI')
WHERE start_inst = TO_DATE('9/12/2011 12:57', 'MM/DD/YYYY HH24:MI');

...the query above would return two rows like this...

NAME                 START_INST       END_INST               CODE    SUBCODE
-------------------- ---------------- ---------------- ---------- ----------
.
.
.
Person1              09/12/2011 12:26 09/12/2011 12:57        161         71
Person1              09/12/2011 13:00 09/12/2011 13:07        161         71
.
.
.

...whereas @stevo's query would return one row like this...

NAME                 START_INST       END_INST               CODE    SUBCODE
-------------------- ---------------- ---------------- ---------- ----------
.
.
.
Person1              12/09/2011 12:26 12/09/2011 13:07        161         71
.
.
.

Hope this helps.

adapting desm's query, I think this should work

WITH
  sequenced_data AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY name                ORDER BY start_inst) NameSequenceID,
ROW_NUMBER() OVER (PARTITION BY name, code, subcode ORDER BY start_inst)     NameStateSequenceID,
d.*
FROM
data d
) 
SELECT
  name,
  to_char(MIN(start_inst),'DD/MM/YYYY HH24:MI') start_inst,
  to_char(MAX(end_inst),'DD/MM/YYYY HH24:MI')   end_inst,
  code,
  subcode
FROM
  sequenced_data
GROUP BY
  name,
  code,
  subcode,
  NameSequenceID - NameStateSequenceID
ORDER BY name,start_inst
wildplasser

You can do that with a recursive query (something with CONNECT BY / PRIOR in oracle, IIRC) I did the same thing for Postgres in this thread : Get total time interval from multiple rows if sequence not broken

It might need a bit of reworking to make it fit into the oracle syntax.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!