PostgreSQL where all in array

后端未结

关注

 9  1571

情书的邮戳 2020-11-30 07:23

What is the easiest and fastest way to achieve a clause where all elements in an array must be matched - not only one when using IN? After all it should behave

9条回答

渐次进展 (楼主)

2020-11-30 08:18
While @Alex' answer with IN and count() is probably the simplest solution, I expect this PL/pgSQL function to be the faster:
```
CREATE OR REPLACE FUNCTION f_conversations_among_users(_user_arr int[])
  RETURNS SETOF conversations AS
$BODY$
DECLARE
    _sql text := '
    SELECT c.*
    FROM   conversations c';
    i int;
BEGIN

FOREACH i IN ARRAY _user_arr LOOP
    _sql  := _sql  || '
    JOIN   conversations_users x' || i || ' USING (conversation_id)';
END LOOP;

_sql  := _sql  || '
    WHERE  TRUE';

FOREACH i IN ARRAY _user_arr LOOP
    _sql  := _sql  || '
    AND    x' || i || '.user_id = ' || i;
END LOOP;

/* uncomment for conversations with exact list of users and no more
_sql  := _sql  || '
    AND    NOT EXISTS (
        SELECT 1
        FROM   conversations_users u
        WHERE  u.conversation_id = c.conversation_id
        AND    u.user_id <> ALL (_user_arr)
        )
*/

-- RAISE NOTICE '%', _sql;
RETURN QUERY EXECUTE _sql;

END;
$BODY$ LANGUAGE plpgsql VOLATILE;
```
Call:
```
SELECT * FROM f_conversations_among_users('{1,2}')
```
The function dynamically builds executes a query of the form:
```
SELECT c.*
FROM   conversations c
JOIN   conversations_users x1 USING (conversation_id)
JOIN   conversations_users x2 USING (conversation_id)
...
WHERE  TRUE
AND    x1.user_id = 1
AND    x2.user_id = 2
...
```
This form performed best in an extensive test of queries for relational division.

You could also build the query in your app, but I went by the assumption that you want to use one array parameter. Also, this is probably fastest anyway.

Either query requires an index like the following to be fast:
```
CREATE INDEX conversations_users_user_id_idx ON conversations_users (user_id);
```
A multi-column primary (or unique) key on (user_id, conversation_id) is just as well, but one on (conversation_id, user_id) (like you may very well have!) would be inferior. You find a short rationale at the link above, or a comprehensive assessment under this related question on dba.SE

I also assume you have a primary key on conversations.conversation_id.

Can you run a performance test with EXPLAIN ANALYZE on @Alex' query and this function and report your findings?

Note that both solutions find conversations where at least the users in the array take part - including conversations with additional users.
If you want to exclude those, un-comment the additional clause in my function (or add it to any other query).

Tell me if you need more explanation on the features of the function.
0 讨论(0)

查看其它9个回答
发布评论:

提交评论
- 加载中...