CTE to represent a logical table for the rows in a table which have the max value in one column

问题

I have an "insert only" database, wherein records aren't physically updated, but rather logically updated by adding a new record, with a CRUD value, carrying a larger sequence. In this case, the "seq" (sequence) column is more in line with what you may consider a primary key, but the "id" is the logical identifier for the record. In the example below,

This is the physical representation of the table:

seq   id    name   | CRUD |
----|-----|--------|------|
1   | 10  | john   | C    |
2   | 10  | joe    | U    |
3   | 11  | kent   | C    |
4   | 12  | katie  | C    |
5   | 12  | sue    | U    |
6   | 13  | jill   | C    |
7   | 14  | bill   | C    |

This is the logical representation of the table, considering the "most recent" records:

seq   id    name   | CRUD |
----|-----|--------|------|
2   | 10  | joe    | U    |
3   | 11  | kent   | C    |
5   | 12  | sue    | U    |
6   | 13  | jill   | C    |
7   | 14  | bill   | C    |

In order to, for instance, retrieve the most recent record for the person with id=12, I would currently do something like this:

SELECT 
    *
FROM
    PEOPLE P
WHERE       
    P.ID = 12
AND
    P.SEQ = (
        SELECT
            MAX(P1.SEQ)
        FROM
            PEOPLE P1
        WHERE P.ID = 12
    )

...and I would receive this row:

seq   id    name   | CRUD |
----|-----|--------|------|
5   | 12  | sue    | U    |

What I'd rather do is something like this:

WITH
    NEW_P
AS
(
    --CTE representing all of the most recent records
    --i.e. for any given id, the most recent sequence
)

SELECT 
    *
FROM
    NEW_P P2
WHERE       
    P2.ID = 12

The first SQL example using the the subquery already works for us.

Question: How can I leverage a CTE to simplify our predicates when needing to leverage the "most recent" logical view of the table. In essence, I don't want to inline a subquery every single time I want to get at the most recent record. I'd rather define a CTE and leverage that in any subsequent predicate.

P.S. While I'm currently using DB2, I'm looking for a solution that is database agnostic.

回答1:

This is a clear case for window (or OLAP) functions, which are supported by all modern SQL databases. For example:

WITH
    ORD_P
AS
(
   SELECT p.*, ROW_NUMBER() OVER ( PARTITION BY id ORDER BY seq DESC) rn
   FROM people p
)
,
    NEW_P
AS 
(
    SELECT * from ORD_P
    WHERE rn = 1
)
SELECT 
    *
FROM
    NEW_P P2
WHERE       
    P2.ID = 12

PS. Not tested. You may need to explicitly list all columns in the CTE clauses.

回答2:

I guess you already put it together. First find the max seq associated with each id, then use that to join back to the main table:

WITH newp AS (
  SELECT id, MAX(seq) AS latestseq
    FROM people
    GROUP BY id
)
SELECT p.*
  FROM people p
  JOIN newp n ON (n.latestseq = p.seq)
  ORDER BY p.id

What you originally had would work, or moving the CTE into the "from" clause. Maybe you want to use a timestamp field rather than a sequence number for the ordering?

回答3:

Following up from @Glenn's answer, here is an updated query which meets my original goal and is on par with @mustaccio's answer, but I'm still not sure what the performance (and other) implications of this approach vs the other are.

WITH
    LATEST_PERSON_SEQS AS
    (
        SELECT
            ID,
            MAX(SEQ) AS LATEST_SEQ
        FROM
            PERSON
        GROUP BY
            ID
    )
    ,
    LATEST_PERSON AS
    (
        SELECT
            P.*
        FROM
            PERSON P
        JOIN
            LATEST_PERSON_SEQS L
        ON
            (
                L.LATEST_SEQ = P.SEQ)
    )
SELECT
    *
FROM
    LATEST_PERSON L2
WHERE
    L2.ID = 12

来源：https://stackoverflow.com/questions/26426952/cte-to-represent-a-logical-table-for-the-rows-in-a-table-which-have-the-max-valu

标签

sql

db2

common-table-expression