Subquery to return the latest entry for each parent ID

落爺英雄遲暮 提交于 2019-12-10 10:03:25

问题


I have a parent table with entries for documents and I have a history table which logs an audit entry every time a user accesses one of the documents.

I'm writing a search query to return a list of documents (filtered by various criteria) with the latest user id to access each document returned in the result set.

Thus for


    DOCUMENTS
    ID | NAME
    1  | Document 1
    2  | Document 2
    3  | Document 3
    4  | Document 4
    5  | Document 5

    HISTORY
    DOC_ID | USER_ID | TIMESTAMP
    1      | 12345   | TODAY
    1      | 11111   | IN THE PAST
    1      | 11111   | IN THE PAST
    1      | 12345   | IN THE PAST
    2      | 11111   | TODAY
    2      | 12345   | IN THE PAST
    3      | 12345   | IN THE PAST

I'd be looking to get a return from my search like


    ID | NAME       | LAST_USER_ID
    1  | Document 1 | 12345
    2  | Document 2 | 11111
    3  | Document 3 | 12345
    4  | Document 4 | 
    5  | Document 5 | 

Can I easily do this with one SQL query and a join between the two tables?


回答1:


Revising what Andy White produced, and replacing square brackets (MS SQL Server notation) with DB2 (and ISO standard SQL) "delimited identifiers":

SELECT d.id, d.name, h.last_user_id
    FROM Documents d LEFT JOIN
         (SELECT r.doc_id AS id, user_id AS last_user_id
              FROM History r JOIN
                   (SELECT doc_id, MAX("timestamp") AS "timestamp"
                        FROM History
                        GROUP BY doc_id
                   ) AS l
                   ON  r."timestamp" = l."timestamp"
                   AND r.doc_id      = l.doc_id
         ) AS h
         ON d.id = h.id

I'm not absolutely sure whether "timestamp" or "TIMESTAMP" is correct - probably the latter.

The advantage of this is that it replaces the inner correlated sub-query in Andy's version with a simpler non-correlated sub-query, which has the potential to be (radically?) more efficient.




回答2:


I couldn't get the "HAVING MAX(TIMESTAMP)" to run in SQL Server - I guess having requires a boolean expression like "having max(TIMESTAMP) > 2009-03-05" or something, which doesn't apply in this case. (I might be doing something wrong...)

Here is something that seems to work - note the join has 2 conditions (not sure if this is good or not):

select
    d.ID,
    d.NAME,
    h."USER_ID" as "LAST_USER_ID"
from Documents d
left join History h
    on d.ID = h.DOC_ID
    and h."TIMESTAMP" =
    (
        select max("TIMESTAMP")
        from "HISTORY"
        where "DOC_ID" = d.ID
    )



回答3:


This doesn't use a join, but for some queries like this I like to inline the select for the field. If you want to catch the situation when no user has accessed you can wrap it with an NVL().

select a.ID, a.NAME,
(select x.user_id
 from HISTORY x
 where x.doc_id = a.id
   and x.timestamp = (select max(x1.timestamp)
                      from HISTORY x1
                      where x1.doc_id = x.doc_id)) as LAST_USER_ID
from DOCUMENTS a
where <your criteria here>



回答4:


I think it should be something like this:

SELECT ID, Name,  b.USER_ID as LAST_USER_ID
FROM DOCUMENTS a LEFT JOIN
    ( SELECT DOC_ID, USER_ID 
          FROM HISTORY
              GROUP BY DOC_ID, USER_ID
              HAVING MAX( TIMESTAMP )) as b
    ON a.ID = b.DOC_ID

this might work also:

SELECT ID, Name,  b.USER_ID as LAST_USER_ID
FROM DOCUMENTS a 
  LEFT JOIN HISTORY b ON a.ID = b.DOC_ID
GROUP BY DOC_ID, USER_ID
HAVING MAX( TIMESTAMP )



回答5:


Select ID, Name, User_ID
From Documents Left Outer Join
History a on ID = DOC_ID
Where ( TimeStamp = ( Select Max(TimeStamp)
                      From History b
                      Where a.DOC_ID = b.DOC_ID ) OR
        TimeStamp Is NULL )  /* this accomodates the Left */


来源:https://stackoverflow.com/questions/625055/subquery-to-return-the-latest-entry-for-each-parent-id

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!