I want to run this query:
SELECT DISTINCT ON (address_id) purchases.address_id, purchases.*
FROM purchases
WHERE purchases.product_id = 1
ORDER BY purchases.purchased_at DESC
But I get this error:
PG::Error: ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
Adding address_id
as first ORDER BY
expression silences the error, but I really don't want to add sorting over address_id
. Is it possible to do without ordering by address_id
?
Documentation says:
DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. [...] Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. [...] The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s).
So you'll have to add the address_id
to the order by.
Alternatively, if you're looking for the full row that contains the most recent purchased product for each address_id
and that result sorted by purchased_at
then you're trying to solve a greatest N per group problem which can be solved by the following approaches:
The general solution that should work in most DBMSs:
SELECT t1.* FROM purchases t1
JOIN (
SELECT address_id, max(purchased_at) max_purchased_at
FROM purchases
WHERE product_id = 1
GROUP BY address_id
) t2
ON t1.address_id = t2.address_id AND t1.purchased_at = t2.max_purchased_at
ORDER BY t1.purchased_at DESC
A more PostgreSQL-oriented solution based on @hkf's answer:
SELECT * FROM (
SELECT DISTINCT ON (address_id) *
FROM purchases
WHERE product_id = 1
ORDER BY address_id, purchased_at DESC
) t
ORDER BY purchased_at DESC
Problem clarified, extended and solved here: Selecting rows ordered by some column and distinct on another
You can order by address_id in an subquery, then order by what you want in an outer query.
SELECT * FROM
(SELECT DISTINCT ON (address_id) purchases.address_id, purchases.*
FROM "purchases"
WHERE "purchases"."product_id" = 1 ORDER BY address_id DESC )
ORDER BY purchased_at DESC
A subquery can solve it:
SELECT *
FROM (
SELECT DISTINCT ON (address_id) *
FROM purchases
WHERE product_id = 1
) p
ORDER BY purchased_at DESC;
Leading expressions in ORDER BY
have to agree with columns in DISTINCT ON
, so you can't order by different columns in the same SELECT
.
Only use an additional ORDER BY
in the subquery if you want to pick a particular row from each set:
SELECT *
FROM (
SELECT DISTINCT ON (address_id) *
FROM purchases
WHERE product_id = 1
ORDER BY address_id, purchased_at DESC -- get "latest" row per address_id
) p
ORDER BY purchased_at DESC;
If purchased_at
can be NULL
, consider DESC NULLS LAST
.
Related, with more explanation:
Window function may solve that in one pass:
SELECT DISTINCT ON (address_id)
LAST_VALUE(purchases.address_id) OVER wnd AS address_id
FROM "purchases"
WHERE "purchases"."product_id" = 1
WINDOW wnd AS (
PARTITION BY address_id ORDER BY purchases.purchased_at DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
For anyone using Flask-SQLAlchemy, this worked for me
from app import db
from app.models import Purchases
from sqlalchemy.orm import aliased
from sqlalchemy import desc
stmt = Purchases.query.distinct(Purchases.address_id).subquery('purchases')
alias = aliased(Purchases, stmt)
distinct = db.session.query(alias)
distinct.order_by(desc(alias.purchased_at))
SELECT DISTINCT ON (address_id) purchases.address_id, purchases.*
FROM purchases
WHERE purchases.product_id = 1
ORDER BY address_id, purchases.purchased_at DESC
ORDER BY address_id, purchases.purchased_at DESC
address_id must be added in order by for the DISTINCT ON() function
You can also done this by using group by clause
SELECT purchases.address_id, purchases.* FROM "purchases"
WHERE "purchases"."product_id" = 1 GROUP BY address_id,
purchases.purchased_at ORDER purchases.purchased_at DESC
来源:https://stackoverflow.com/questions/9795660/postgresql-distinct-on-with-different-order-by