Postgresql: Matching Patterns between Two Columns

后端 未结 3 924
名媛妹妹
名媛妹妹 2021-01-20 00:27

I have two columns say Main and Sub. (they can be of same table or not).

Main is varchar of length 20 and Sub is

3条回答
  •  没有蜡笔的小新
    2021-01-20 00:55

    The answers so far fail to address your question:

    but I want use Like, % , _ etc in my query so that I can loosely match the pattern (that is not all 8 characters).

    It makes hardly any difference whether you use LIKE or = as long as you match the whole string (and there are no wildcard character in your string). To make the search fuzzy, you need to replace part of the pattern, not just add to it.

    For instance, to match on the last 7 (instead of 8) characters of subcolumn:

    SELECT *
    FROM   maintable m
    WHERE  left(maincolumn, 8) LIKE 
           ( '%' || left((SELECT subcolumn FROM subtable WHERE subid = 2), 7));
    

    I use the simpler left() (introduced with Postgres 9.1).
    You could simplify this to:

    SELECT *
    FROM   maintable m
    WHERE  left(maincolumn, 7) =
           (SELECT left(subcolumn,7) FROM subtable WHERE subid = 2);
    

    But you wouldn't if you use the special index I mention further down, because expressions in functional indexes have to matched precisely to be of use.

    You may be interested in the extension pg_tgrm.

    In PostgreSQL 9.1 run once per database:

    CREATE EXTENSION pg_tgrm;
    

    Two reasons:

    • It supplies the similarity operator %. With it you can build a smart similarity search:

      --SELECT show_limit();
      SELECT set_limit(0.5); -- adjust similarity limit for % operator
      
      SELECT *
      FROM maintable m
      WHERE left(maincolumn, 8) %
            (SELECT subcolumn FROM subtable WHERE subid = 2);
      
    • It supplies index support for both LIKE and %

      If read performance is more important than write performance, I suggest you create a functional GIN or GiST index like this:

      CREATE INDEX maintable_maincol_tgrm_idx ON maintable
      USING gist (left(maincolumn, 8) gist_trgm_ops);
      

      This index supports either query. Be aware that it comes with some cost for write operations.
      A quick benchmark for a similar case in this related answer.

提交回复
热议问题