What is the easiest and fastest way to achieve a clause where all elements in an array must be matched - not only one when using IN
? After all it should behave
While @Alex' answer with IN
and count()
is probably the simplest solution, I expect this PL/pgSQL function to be the faster:
CREATE OR REPLACE FUNCTION f_conversations_among_users(_user_arr int[])
RETURNS SETOF conversations AS
$BODY$
DECLARE
_sql text := '
SELECT c.*
FROM conversations c';
i int;
BEGIN
FOREACH i IN ARRAY _user_arr LOOP
_sql := _sql || '
JOIN conversations_users x' || i || ' USING (conversation_id)';
END LOOP;
_sql := _sql || '
WHERE TRUE';
FOREACH i IN ARRAY _user_arr LOOP
_sql := _sql || '
AND x' || i || '.user_id = ' || i;
END LOOP;
/* uncomment for conversations with exact list of users and no more
_sql := _sql || '
AND NOT EXISTS (
SELECT 1
FROM conversations_users u
WHERE u.conversation_id = c.conversation_id
AND u.user_id <> ALL (_user_arr)
)
*/
-- RAISE NOTICE '%', _sql;
RETURN QUERY EXECUTE _sql;
END;
$BODY$ LANGUAGE plpgsql VOLATILE;
Call:
SELECT * FROM f_conversations_among_users('{1,2}')
The function dynamically builds executes a query of the form:
SELECT c.*
FROM conversations c
JOIN conversations_users x1 USING (conversation_id)
JOIN conversations_users x2 USING (conversation_id)
...
WHERE TRUE
AND x1.user_id = 1
AND x2.user_id = 2
...
This form performed best in an extensive test of queries for relational division.
You could also build the query in your app, but I went by the assumption that you want to use one array parameter. Also, this is probably fastest anyway.
Either query requires an index like the following to be fast:
CREATE INDEX conversations_users_user_id_idx ON conversations_users (user_id);
A multi-column primary (or unique) key on (user_id, conversation_id)
is just as well, but one on (conversation_id, user_id)
(like you may very well have!) would be inferior. You find a short rationale at the link above, or a comprehensive assessment under this related question on dba.SE
I also assume you have a primary key on conversations.conversation_id
.
Can you run a performance test with EXPLAIN ANALYZE
on @Alex' query and this function and report your findings?
Note that both solutions find conversations where at least the users in the array take part - including conversations with additional users.
If you want to exclude those, un-comment the additional clause in my function (or add it to any other query).
Tell me if you need more explanation on the features of the function.