Selecting rows ordered by some column and distinct on another

前端 未结 3 1065
天命终不由人
天命终不由人 2020-12-01 07:57

Related to - PostgreSQL DISTINCT ON with different ORDER BY

I have table purchases (product_id, purchased_at, address_id)

Sample data:

| id |         


        
相关标签:
3条回答
  • 2020-12-01 08:27

    This query is trickier to rephrase properly than it looks.

    The currently accepted, join-based answer doesn’t correctly handle the case where two candidate rows have the same given purchased_at value: it will return both rows.

    You can get the right behaviour this way:

    SELECT * FROM purchases AS given
    WHERE product_id = 2
    AND NOT EXISTS (
        SELECT NULL FROM purchases AS other
        WHERE given.address_id = other.address_id
        AND (given.purchased_at < other.purchased_at OR given.id < other.id)
    )
    ORDER BY purchased_at DESC
    

    Note how it has a fallback of comparing id values to disambiguate the case in which the purchased_at values match. This ensures that the condition can only ever be true for a single row among those that have the same address_id value.

    The original query using DISTINCT ON handles this case automatically!

    Also note the way that you are forced to encode the fact that you want “the latest for each address_id” twice, both in the given.purchased_at < other.purchased_at condition and the ORDER BY purchased_at DESC clause, and you have to make sure they match. I had to spend a few extra minutes to convince myself that this query is really positively correct.

    It’s much easier to write this query correctly and understandbly by using DISTINCT ON together with an outer subquery, as suggested by dbenhur.

    0 讨论(0)
  • 2020-12-01 08:29

    Your ORDER BY is used by DISTINCT ON for picking which row for each distinct address_id to produce. If you then want to order the resulting records, make the DISTINCT ON a subselect and order its results:

    SELECT * FROM
    (
      SELECT DISTINCT ON (address_id) purchases.address_id, purchases.*
      FROM "purchases"
      WHERE "purchases"."product_id" = 2
      ORDER BY purchases.address_id ASC, purchases.purchased_at DESC
    ) distinct_addrs
    order by distinct_addrs.purchased_at DESC
    
    0 讨论(0)
  • 2020-12-01 08:32

    Quite a clear question :)

    SELECT t1.* FROM purchases t1
    LEFT JOIN purchases t2
    ON t1.address_id = t2.address_id AND t1.purchased_at < t2.purchased_at
    WHERE t2.purchased_at IS NULL
    ORDER BY t1.purchased_at DESC
    

    And most likely a faster approach:

    SELECT t1.* FROM purchases t1
    JOIN (
        SELECT address_id, max(purchased_at) max_purchased_at
        FROM purchases
        GROUP BY address_id
    ) t2
    ON t1.address_id = t2.address_id AND t1.purchased_at = t2.max_purchased_at
    ORDER BY t1.purchased_at DESC
    
    0 讨论(0)
提交回复
热议问题