PostgreSQL: Grouping then filtering table, with condition for nonexistence

こ雲淡風輕ζ 提交于 2020-01-16 04:15:07

问题


In PostgreSQL, I have a table that, abstractly, looks like this:

╔═══╦═══╦═══╦═══╗
║ A ║ B ║ C ║ D ║
╠═══╬═══╬═══╬═══╣
║ x ║ 0 ║ y ║ 0 ║
║ x ║ 0 ║ x ║ 1 ║
║ x ║ 1 ║ y ║ 0 ║
║ x ║ 1 ║ z ║ 1 ║
║ y ║ 0 ║ z ║ 0 ║
║ y ║ 0 ║ x ║ 0 ║
║ y ║ 1 ║ y ║ 0 ║
╚═══╩═══╩═══╩═══╝

I want to transform it in a query into this:

╔═══╦═══╦══════╗
║ A ║ B ║  D   ║
╠═══╬═══╬══════╣
║ x ║ 0 ║ 1    ║
║ x ║ 1 ║ null ║
║ y ║ 0 ║ null ║
║ y ║ 1 ║ 0    ║
╚═══╩═══╩══════╝

…such that:

  1. The input table’s rows are grouped by A and B, and
  2. For each A and B pair:

    • If the input table has any row such that A = C, then the output table has a row (A, B, D), where D is from the same row in which A = C.

      For instance, the input table has a row (x, 0, x, 1), in which both A and C are x. That means the output table has a row (x, 0, 1), because D is 1. The (x, 0, y, 0) row (because it also has A = x and B = 0) is discarded.

    • Otherwise, if no such row exists, then the output table has a row (A, B, null).

      For instance, the input table has two rows in which A = y and B = 0—they are (y, 0, z, 0) and (y, 0, x, 0). In neither of these rows does A = C. That means that the output table has a (y, 0, null) row.

I can’t figure out any way to use aggregate functions, window functions, or subqueries to perform this transformation.


回答1:


To get a single row from each group with the same (A, B) there is a simple, short and fast way: DISTINCT ON - not involving aggregate functions, window functions, or subqueries at all:

SELECT DISTINCT ON (A, B)
       A, B, CASE WHEN A = C THEN D END AS D
FROM   tbl
ORDER  BY A, B, (A = C) DESC;

Produces your desired result exactly.

db<>fiddle here

Assuming all involved columns are defined NOT NULL, or you need to do more.

The final ORDER BY item (A = C) DESC sorts the row with A = C first per group. It's a boolean expression, and FALSE sorts before TRUE. If there can be multiple rows, add more ORDER BY items to break ties.

CASE WHEN A = C THEN D END implements your requirement that D is only output for the given condition. Else we get NULL (the default for CASE), as desired.

Detailed explanation:

  • Select first row in each GROUP BY group?
  • Sorting null values after all others, except special

More performance optimization may be possible for big tables:

  • Optimize GROUP BY query to retrieve latest row per user



回答2:


Use a CTE which returns all the rows where A = C and join to the table:

with cte as (
  select * from tablename
  where "A" = "C"
)  
select distinct t."A", t."B", c."D"
from tablename t left join cte c
on c."A" = t."A" and c."B" = t."B"
order by t."A", t."B"

See the demo.
Results:

| A   | B   | D   |
| --- | --- | --- |
| x   | 0   | 1   |
| x   | 1   |     |
| y   | 0   |     |
| y   | 1   | 0   |


来源:https://stackoverflow.com/questions/57184462/postgresql-grouping-then-filtering-table-with-condition-for-nonexistence

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!