SQL min values from two columns across two tables against ID

问题

I have below mentioned data. I am looking to get min of Start message and corresponding min of success message. If there is no start or success message present then it should show null.

Start Message Table:

ID1     Timestamp_start_msg_recieved    date        jobid      message time in seconds
1234    5/14/2014 10:02:29              5/14/2014   abc        start 262
1234    5/14/2014 10:02:31              5/14/2014   abc        start 264
1234    5/14/2014 10:02:45              5/14/2014   abc        start 278
1234    5/14/2014 10:02:50              5/14/2014   abc        start 285
1234    5/14/2014 10:09:04              5/14/2014   abc        start 165
1234    5/14/2014 10:09:06              5/14/2014   abc        start 2167
1234    5/14/2014 10:09:16              5/14/2014   abc        start 2180
1234    5/14/2014 10:09:26              5/14/2014   abc        start 2190
1234    5/14/2014 11:45:11              5/14/2014   abc        start 8767
1234    5/14/2014 16:48:20              5/14/2014   abc        start 878
1234    5/14/2014 19:02:52              5/14/2014   abc        start 687
5678    5/14/2014 22:02:52              5/14/2014   pqr        start 501
5678    5/14/2014 23:10:40              5/14/2014   abcd        start 200

Success Message Table:

ID1     Timestamp_success_msg_recieved  date        jobid  message time in seconds
1234    5/14/2014 10:02:52              5/14/2014   abc    successful 290
1234    5/14/2014 10:09:32              5/14/2014   abc    successful 4280 
1234    5/14/2014 11:45:15              5/14/2014   abc    successful 8774
1234    5/14/2014 11:45:18              5/14/2014   abc    successful 8777
1234    5/14/2014 11:45:19              5/14/2014   abc    successful 8778
1234    5/14/2014 11:45:25              5/14/2014   abc    successful 8784
1234    5/14/2014 16:48:22              5/14/2014   abc    successful 880 
1234    5/14/2014 19:03:00              5/14/2014   abc    successful 699
5678    5/14/2014 22:03:00              5/14/2014   pqr    successful 250
5678    5/19/2014 14:00:16              5/19/2014   pqr    successful 400

Expected Result:

ID1  IMESTAMP_for_start_message TIMESTAMP_for_success_message    Date       Jobid    msg  msg start_secs success_secs
1234 5/14/2014 10:02:29         5/14/2014 10:02:52           5/14/2014  abc start success 262 290 
1234 5/14/2014 10:09:04         5/14/2014 10:09:32           5/14/2014  abc start success 165 4280
1234 5/14/2014 11:45:11         5/14/2014 11:45:25           5/14/2014  abc start success 8767 8784
1234 5/14/2014 16:48:20         5/14/2014 16:48:22           5/14/2014  abc start success 878 880
1234 5/14/2014 19:02:52         5/14/2014 19:03:00           5/14/2014  abc start success 687 699
5678 5/14/2014 22:02:52         5/14/2014 22:03:00           5/14/2014  pqr start success 501 699
5678 5/14/2014 23:10:40         null                         5/14/2014  abcd start success 250 null
5678    null                   5/19/2014 14:00:16            5/19/2014  pqr null  success null 400

I am trying to get Min of start_timestamp in combination with the very next Min of success_timestamp corresponding to id1 and jobid. If there is a list of start message and no success message for a given id1 and jobid, then it should show NULL and viceversa. Tried using Temporary table using WITH clause and also used self join method. Below is my query, But WITH clause query returns MIN of overall data in the table.

NOTE: TIME IN SECONDS has random values and not actual data.

Query Used:

WITH DATA AS
  (SELECT MIN(smt.column13) timestamp_for_success_message
  FROM success_table1 smt, start_table2 b
     WHERE
    (SMT.id1 = b.id1)
    AND (SMT.jobid = b.jobid)
    AND (SMT.timestamp_for_success_message_recieved >= b.timestamp_for_start_message_recieved)
  )
SELECT distinct a.timestamp_for_success_message_recieved,
  b.timestamp_for_start_message_recieved,
  b.id1,
  b.jobid
FROM data a,
  start_table2 b
order by b.timestamp_start_message_recieved, a.timestamp_for_success_message_recieved, b.jobid, b.id1;

回答1:

select nvl(a.ID1,b.ID1) ID1 ,  start_timestamp , success_timestamp
from
(select ID1 , min(timestamp) start_timestamp
from Start_Message_Table
group by ID1) a
full outer join
(select ID1 , min(timestamp) success_timestamp
from Success_Message_Table
group by ID1) b
on a.ID1 = b.ID1;

Hoping , i undestand the problem clearly. Try to use above query. Please add if any extra columns required in inner queries.

回答2:

This solution is not single query, it requires creating table for results and running procedure. I think it's possible to solve this problem with recursive query, but I didn't manage to create one.

Procedure is not optimised, probably slow on big sets of data, but... it works.

One more thing - I'm not sure what that TIME IN SECONDS has random values and not actual data means. Your results suggests, that you just ignore seconds in calculations, but want them in result, so I rebuilt code (what probably slows things due to all these trunc's and others). Also - it would be easier if you added primary key to tables.

Code to create table, run procedure and get results:

create table table_pairs as 
  select ts.ID1, ts.TIME_START, te.TIME_END, ts.tdate, 
      ts.JOBID, ts.MSG msg_start, te.MSG msg_end, 
      cast(null as timestamp) time_start_max
    from table_start ts, table_end te
    where 1=0;

begin p_pairs; end; 

select id1, to_char(time_start, 'yyyy-MM-dd HH24:mi:ss') time_start, 
    to_char(time_end, 'yyyy-MM-dd HH24:mi:ss') time_end,
    tdate, jobid, msg_start, msg_end
    --, to_char(time_start_max, 'yyyy-MM-dd HH24:mi:ss') time_start_max
  from table_pairs
  order by id1, time_start, time_end, jobid;

Results:

id1   time_start           time_end             tdate       jobid  msg_start    msg_end
----  -------------------  -------------------  ----------  -----  ------------ ----------------
1234  2014-05-14 10:02:29  2014-05-14 10:02:52  2014-05-14  abc    start 262    successful 290
1234  2014-05-14 10:09:04  2014-05-14 10:09:32  2014-05-14  abc    start 165    successful 4280
1234  2014-05-14 11:45:11  2014-05-14 11:45:25  2014-05-14  abc    start 8767   successful 8784
1234  2014-05-14 16:48:20  2014-05-14 16:48:22  2014-05-14  abc    start 878    successful 880
1234  2014-05-14 19:02:52  2014-05-14 19:03:00  2014-05-14  abc    start 687    successful 699
5678  2014-05-14 22:02:52  2014-05-14 22:03:00  2014-05-14  pqr    start 501    successful 250
5678  2014-05-14 23:10:40                       2014-05-14  abcd   start 200
5678                       2014-05-19 14:00:16  2014-05-14  pqr                 successful 400

Procedure:

create or replace procedure p_pairs is

  r_start table_start%rowtype;
  r_pair table_pairs%rowtype;
  v_start_min table_pairs.time_start%type;
  v_start_max table_pairs.time_start_max%type;
  cursor c_success is 
    select * from table_pairs order by id1, jobid, time_end for update;

begin

  begin -- delete everything from w_pairs and insert all ended processes
    delete from table_pairs;

    --simple version with proper seconds handling
    --insert into table_pairs (id1, jobid, tdate, time_end, msg_end) 
    --  (select id1, jobid, tdate, time_end, msg from table_end);

    -- complicated version for seconds ignored
    insert into table_pairs (id1, jobid, tdate, time_end, msg_end) 
      (select id1, jobid, tdate, max(time_end), max(msg)
        from (
          select id1, jobid, tdate, time_end, 
              last_value(msg) over (partition by id1, jobid, tdate, trunc(time_end, 'mi')
                order by null rows between unbounded preceding and unbounded following) msg
            from table_end) 
        group by id1, jobid, tdate, trunc(time_end, 'mi')
      );

  end;

  for r_pair in c_success
  loop

    begin -- find matching starting process

      select min(time_start), max(time_start) into v_start_min, v_start_max
        from (
          select * from table_start ts1
            where ts1.id1 = r_pair.id1 and ts1.jobid = r_pair.jobid 
              and trunc(ts1.time_start, 'mi') <= trunc(r_pair.time_end, 'mi')
          minus -- eliminate already "used" processes
          select * from table_start ts2
            where ts2.jobid = r_pair.jobid 
              and trunc(ts2.time_start, 'mi') <= (
                select trunc(max(time_start_max), 'mi') from table_pairs
                  where table_pairs.jobid = r_pair.jobid and table_pairs.id1=r_pair.id1
                )
          );

      select * into r_start
        from (
          select * from table_start ts 
            where ts.jobid = r_pair.jobid and ts.id1 = r_pair.id1 
              and trunc(time_start,'mi') <= trunc(r_pair.time_end, 'mi')
              and trunc(ts.time_start, 'mi') = trunc(v_start_min, 'mi')
            order by time_start
          )
        where rownum = 1;

      update table_pairs set 
          tdate = r_start.tdate,
          time_start = v_start_min,
          time_start_max = v_start_max,
          msg_start = r_start.msg
        where current of c_success;

    exception when no_data_found then 
      null;  -- no matching starting process
    end;

  end loop;

  begin -- add started and not finished processes

    insert into table_pairs (id1, jobid, time_start, tdate, msg_start)
    select id1, jobid, time_start, tdate, msg
      from (
        select * from table_start 
        minus
        select ts.*
          from table_start ts 
            join table_pairs tp
              on ts.jobid = tp.jobid and ts.id1=tp.id1
                and trunc(ts.time_start, 'mi') 
                  between trunc(tp.time_start, 'mi') and trunc(tp.time_start_max, 'mi') 
        );
  end;

end p_pairs;

Input data preparation:

create table TABLE_START
(
  ID1        NUMBER,
  TIME_START TIMESTAMP(6),
  TDATE      DATE,
  JOBID      VARCHAR2(10),
  MSG        VARCHAR2(20)
);
insert into table_start 
select 1234, to_date('05/14/2014 10:02:29', 'MM/DD/YYYY HH24:mi:ss'), to_date('05/14/2014', 'MM/DD/YYYY'), 'abc', 'start 262' from dual
union all select 1234, to_date('05/14/2014 10:02:31', 'MM/DD/YYYY HH24:mi:ss'), to_date('05/14/2014', 'MM/DD/YYYY'), 'abc', 'start 264' from dual
union all select 1234, to_date('05/14/2014 10:02:45', 'MM/DD/YYYY HH24:mi:ss'), to_date('05/14/2014', 'MM/DD/YYYY'), 'abc', 'start 278' from dual
union all select 1234, to_date('05/14/2014 10:02:50', 'MM/DD/YYYY HH24:mi:ss'), to_date('05/14/2014', 'MM/DD/YYYY'), 'abc', 'start 285' from dual
union all select 1234, to_date('05/14/2014 10:09:04', 'MM/DD/YYYY HH24:mi:ss'), to_date('05/14/2014', 'MM/DD/YYYY'), 'abc', 'start 165' from dual
union all select 1234, to_date('05/14/2014 10:09:06', 'MM/DD/YYYY HH24:mi:ss'), to_date('05/14/2014', 'MM/DD/YYYY'), 'abc', 'start 2167' from dual
union all select 1234, to_date('05/14/2014 10:09:16', 'MM/DD/YYYY HH24:mi:ss'), to_date('05/14/2014', 'MM/DD/YYYY'), 'abc', 'start 2180' from dual
union all select 1234, to_date('05/14/2014 10:09:26', 'MM/DD/YYYY HH24:mi:ss'), to_date('05/14/2014', 'MM/DD/YYYY'), 'abc', 'start 2190' from dual
union all select 1234, to_date('05/14/2014 11:45:11', 'MM/DD/YYYY HH24:mi:ss'), to_date('05/14/2014', 'MM/DD/YYYY'), 'abc', 'start 8767' from dual
union all select 1234, to_date('05/14/2014 16:48:20', 'MM/DD/YYYY HH24:mi:ss'), to_date('05/14/2014', 'MM/DD/YYYY'), 'abc', 'start 878' from dual
union all select 1234, to_date('05/14/2014 19:02:52', 'MM/DD/YYYY HH24:mi:ss'), to_date('05/14/2014', 'MM/DD/YYYY'), 'abc', 'start 687' from dual
union all select 5678, to_date('05/14/2014 22:02:52', 'MM/DD/YYYY HH24:mi:ss'), to_date('05/14/2014', 'MM/DD/YYYY'), 'pqr', 'start 501' from dual
union all select 5678, to_date('05/14/2014 23:10:40', 'MM/DD/YYYY HH24:mi:ss'), to_date('05/14/2014', 'MM/DD/YYYY'), 'abcd', 'start 200' from dual

create table TABLE_END
(
  ID1      NUMBER,
  TIME_END TIMESTAMP(6),
  TDATE    DATE,
  JOBID    VARCHAR2(10),
  MSG      VARCHAR2(20)
);
insert into table_end
select 1234, to_date('05/14/2014 10:02:52', 'MM/DD/YYYY HH24:mi:ss'), to_date('05/14/2014', 'MM/DD/YYYY'), 'abc', 'successful 290' from dual
union all select 1234, to_date('05/14/2014 10:09:32', 'MM/DD/YYYY HH24:mi:ss'), to_date('05/14/2014', 'MM/DD/YYYY'), 'abc', 'successful 4280' from dual
union all select 1234, to_date('05/14/2014 11:45:15', 'MM/DD/YYYY HH24:mi:ss'), to_date('05/14/2014', 'MM/DD/YYYY'), 'abc', 'successful 8774' from dual
union all select 1234, to_date('05/14/2014 11:45:18', 'MM/DD/YYYY HH24:mi:ss'), to_date('05/14/2014', 'MM/DD/YYYY'), 'abc', 'successful 8777' from dual
union all select 1234, to_date('05/14/2014 11:45:19', 'MM/DD/YYYY HH24:mi:ss'), to_date('05/14/2014', 'MM/DD/YYYY'), 'abc', 'successful 8778' from dual
union all select 1234, to_date('05/14/2014 11:45:25', 'MM/DD/YYYY HH24:mi:ss'), to_date('05/14/2014', 'MM/DD/YYYY'), 'abc', 'successful 8784' from dual
union all select 1234, to_date('05/14/2014 16:48:22', 'MM/DD/YYYY HH24:mi:ss'), to_date('05/14/2014', 'MM/DD/YYYY'), 'abc', 'successful 880' from dual
union all select 1234, to_date('05/14/2014 19:03:00', 'MM/DD/YYYY HH24:mi:ss'), to_date('05/14/2014', 'MM/DD/YYYY'), 'abc', 'successful 699' from dual
union all select 5678, to_date('05/14/2014 22:03:00', 'MM/DD/YYYY HH24:mi:ss'), to_date('05/14/2014', 'MM/DD/YYYY'), 'pqr', 'successful 250' from dual
union all select 5678, to_date('05/19/2014 14:00:16', 'MM/DD/YYYY HH24:mi:ss'), to_date('05/14/2014', 'MM/DD/YYYY'), 'pqr', 'successful 400' from dual

回答3:

My understanding of the issue is that each row in the start table represents a job of some kind starting. The success table represents that job finishing. To find out when a job finished, you need to find the row in the success table that matches id1 and jobid columns with the lowest timestamp that is greater that the start row timestamp unless there is an earlier row in the start table that matches the success row.

For example the first row in the start table matches the first row in the success table, but the second row in the start table has no match in the success table.

To resolve this I've used nested sub-queries to construct each piece of data needed.

SELECT start.id1, start.jobid, start.TIMESTAMP_START_MSG_RECIEVED AS   start, table2.end
FROM start
LEFT OUTER JOIN (
    SELECT table1.id1, table1.jobid, MIN(table1.start) AS start, table1.end 
    FROM (
        SELECT s.id1, s.jobid, s.TIMESTAMP_START_MSG_RECIEVED AS start, MIN(t.TIMESTAMP_SUCCESS_MSG_RECIEVED) AS end
        FROM start AS s
        LEFT OUTER JOIN success AS t ON t.id1 = s.id1 AND t.jobid = s.jobid AND t.TIMESTAMP_SUCCESS_MSG_RECIEVED >= s.TIMESTAMP_START_MSG_RECIEVED
        GROUP BY s.id1, s.TIMESTAMP_START_MSG_RECIEVED, s.jobid, s.time
        ORDER BY start) AS table1
    GROUP BY table1.id1, table1.jobid, table1.end
    ORDER BY table1.end) AS table2 ON table2.id1 = start.id1 AND table2.jobid = start.jobid AND table2.start = start.TIMESTAMP_START_MSG_RECIEVED
ORDER BY start

The innermost select gets each start row and the lowest end time from the success table.

The next select then gets the rows with the lowest start time from table1 The outer select then joins the start table with table 2 to include all the jobs that have not finished.

来源：https://stackoverflow.com/questions/28444842/sql-min-values-from-two-columns-across-two-tables-against-id

标签

sql

Oracle

oracle11g