PostgreSQL - Select distinct(column1, column2) where a condition holds

问题

I have the following table and some sample records in it:

  id  | attr1_id | attr2_id |      user_id      | rating_id |
------+----------+----------+-------------------+-----------+
 1    |      188 |      201 | user_1@domain.com |         3 |
 2    |      193 |      201 | user_2@domain.com |         2 |
 3    |      193 |      201 | user_2@domain.com |         1 |
 4    |      194 |      201 | user_2@domain.com |         1 |
 5    |      194 |      201 | user_1@domain.com |         1 |
 6    |      192 |      201 | user_2@domain.com |         1 |

The combination of (attr1_id, attr2_id, user_id) is UNIQUE, meaning each user can only create one record with a specific pair of attribute ids.

My goal is to select all distinct combinations of (attr1_id, attr2_id) where rating_id = 1, but only select each combiniation of attr1_id and attr2_id only once, and only where there doesn't exist any other row (by other users) that have rating_id > 1 and refer to the same attr1_id and attr2_id. Note that the combination of attr1_id and attr2_id can be switched around, so given these two records:

  id  | attr1_id | attr2_id |      user_id       | rating_id | override_comment
------+----------+----------+--------------------+-----------+------------------
  20  |       5  |       2  | user_1@domain.com  |         3 |
------+----------+----------+--------------------+-----------+------------------
  21  |       2  |       5  | user_2@domain.com  |         1 |

no row should be counted, as the rows refer to the same combination of attr_ids and one of them has rating_id > 1.

However, if these two rows exist:

  id  | attr1_id | attr2_id |      user_id       | rating_id | override_comment
------+----------+----------+--------------------+-----------+------------------
  20  |       5  |       2  | user_1@domain.com  |         1 |
------+----------+----------+--------------------+-----------+------------------
  21  |       2  |       5  | user_2@domain.com  |         1 |
------+----------+----------+--------------------+-----------+------------------
  22  |       2  |       5  | user_3@domain.com  |         1 |

all rows should only be counted as one, because they all share the same combination of attr1_id and attr2_id and all have rating_id = 1.

In addition, there is some joining and filtering by a joined table column which I'll leave out, but I thought I'd mention it anyway.

SQL Fiddle isn't working for me right now, but I've uploaded some sample data from the compatibility table.

My query so far is this:

SELECT distinct(a1, a2),
       a1,
       a2
FROM
  ( SELECT c.*,
           least(attr1_id, attr2_id) AS a1,
           greatest(attr1_id, attr2_id) AS a2
   FROM compatibility c
   JOIN attribute a ON c.attr1_id = a.id
   JOIN PARAMETER pa ON a.parameter_id = pa.id
   JOIN problem p ON pa.problem_id = p.id
   WHERE p.id = 1
   GROUP BY 1,
            2 HAVING NOT bool_or(rating_id > 1)) s;

In the sample, there are a total of 144 ratings. Each user has created 7 ratings that have a rating_id > 1 and of those 14 ratings, 2 refer to the same set of (attr1_id,attr2_id). Hence, the number I'm looking for would be (77-12) = 65. However, the result here seems to be 77-2 = 75. So only rows where two ratings with the same attribute ids exist, are discarded.

I would also point out my previous question for this matter where I was asked to open a new one.

回答1:

I think this does what you describe:

select least(attr1_id, attr2_id) as attr1, greatest(attr1_id, attr2_id) as attr2
from table t
group by least(attr1_id, attr2_id), greatest(attr1_id, attr2_id) 
having bool_and(rating_d = 1) ;

I don't understand the other tables in your query, because your start with a single table that has everything you need.

来源：https://stackoverflow.com/questions/26908868/postgresql-select-distinctcolumn1-column2-where-a-condition-holds

标签

sql

postgresql

select

count

distinct