PostgreSQL where all in array

后端 未结 9 1571
情书的邮戳
情书的邮戳 2020-11-30 07:23

What is the easiest and fastest way to achieve a clause where all elements in an array must be matched - not only one when using IN? After all it should behave

9条回答
  •  渐次进展
    2020-11-30 08:18

    While @Alex' answer with IN and count() is probably the simplest solution, I expect this PL/pgSQL function to be the faster:

    CREATE OR REPLACE FUNCTION f_conversations_among_users(_user_arr int[])
      RETURNS SETOF conversations AS
    $BODY$
    DECLARE
        _sql text := '
        SELECT c.*
        FROM   conversations c';
        i int;
    BEGIN
    
    FOREACH i IN ARRAY _user_arr LOOP
        _sql  := _sql  || '
        JOIN   conversations_users x' || i || ' USING (conversation_id)';
    END LOOP;
    
    _sql  := _sql  || '
        WHERE  TRUE';
    
    FOREACH i IN ARRAY _user_arr LOOP
        _sql  := _sql  || '
        AND    x' || i || '.user_id = ' || i;
    END LOOP;
    
    /* uncomment for conversations with exact list of users and no more
    _sql  := _sql  || '
        AND    NOT EXISTS (
            SELECT 1
            FROM   conversations_users u
            WHERE  u.conversation_id = c.conversation_id
            AND    u.user_id <> ALL (_user_arr)
            )
    */
    
    -- RAISE NOTICE '%', _sql;
    RETURN QUERY EXECUTE _sql;
    
    END;
    $BODY$ LANGUAGE plpgsql VOLATILE;
    

    Call:

    SELECT * FROM f_conversations_among_users('{1,2}')
    

    The function dynamically builds executes a query of the form:

    SELECT c.*
    FROM   conversations c
    JOIN   conversations_users x1 USING (conversation_id)
    JOIN   conversations_users x2 USING (conversation_id)
    ...
    WHERE  TRUE
    AND    x1.user_id = 1
    AND    x2.user_id = 2
    ...
    

    This form performed best in an extensive test of queries for relational division.

    You could also build the query in your app, but I went by the assumption that you want to use one array parameter. Also, this is probably fastest anyway.

    Either query requires an index like the following to be fast:

    CREATE INDEX conversations_users_user_id_idx ON conversations_users (user_id);
    

    A multi-column primary (or unique) key on (user_id, conversation_id) is just as well, but one on (conversation_id, user_id) (like you may very well have!) would be inferior. You find a short rationale at the link above, or a comprehensive assessment under this related question on dba.SE

    I also assume you have a primary key on conversations.conversation_id.

    Can you run a performance test with EXPLAIN ANALYZE on @Alex' query and this function and report your findings?

    Note that both solutions find conversations where at least the users in the array take part - including conversations with additional users.
    If you want to exclude those, un-comment the additional clause in my function (or add it to any other query).

    Tell me if you need more explanation on the features of the function.

提交回复
热议问题