Delete all rows but one with the greatest value per group

和自甴很熟 提交于 2019-12-11 08:24:03

问题


So, I just recently asked a question: Update using a subquery with aggregates and groupby in Postgres
and it turns out I was going about my issue with flawed logic.

In the same scenario in the question above, instead of updating all the rows to have the max quantity, I'd like to delete the rows that don't have the max quantity (and any duplicate max quantities).

Essentially I need to just convert the below to a delete statement that preserves only the largest quantities per item_name. I'm guessing I'm going to need NOT EXISTS here but I'm not sure how to do that with aggregate functions.

UPDATE transaction t
SET    quantity = sub.max_quantity
FROM  (
     SELECT item_name, max(quantity) AS max_quantity
     FROM   transaction
     GROUP  BY 1
) sub
WHERE t.item_name = sub.item_name
AND   t.quantity IS DISTINCT FROM sub.max_quantity;

回答1:


Since there can be peers sharing the same maximum quantity, the safe route is a subquery with the window function row_number():

DELETE FROM transaction t
USING (
   SELECT some_unique_id, row_number() OVER (PARTITION BY item_name
                                             ORDER BY quantity DESC) AS rn
   FROM   transaction
   GROUP  BY 1
   ) sub
WHERE t.some_unique_id = sub.some_unique_id
AND   sub.rn > 1;

Where some_unique_id can be any unique column or combination of columns (mirrored in the GROUP BY clause).

Ends up to be very similar to this question from today:
Delete rows with duplicates on two fields

If your table is big and you are going to delete large parts of it, consider advanced advice here:
How to delete duplicate entries?



来源:https://stackoverflow.com/questions/22210878/delete-all-rows-but-one-with-the-greatest-value-per-group

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!