Similar UTF-8 strings for autocomplete field

为君一笑 提交于 2019-11-28 11:46:40
Erwin Brandstetter

You are not using the operator class provided by the pg_trgm module. I would create an index like this:

CREATE INDEX label_Lower_unaccent_trgm_idx
ON test_trgm USING gist (lower(unaccent_text(label)) gist_trgm_ops);

Originally, I had a GIN index here, but I later learned that a GiST is probably even better suited for this kind of query because it can return values sorted by similarity. More details:

Your query has to match the index expression to be able to make use of it.

SELECT label
FROM   the_table
WHERE  lower(unaccent_text(label)) % 'fil'
ORDER  BY similarity(label, 'fil') DESC -- it's ok to use original string here

However, "filbert" and "filé powder" are not actually very similar to "fil" according to the % operator. I suspect what you really want is this:

SELECT label
FROM   the_table
WHERE  lower(unaccent_text(label)) ~~ '%fil%'
ORDER  BY similarity(label, 'fil') DESC -- it's ok to use original string here

This will find all strings containing the search string, and sort the best matches according to the % operator first.

And the juicy part: the expression can use a GIN or GiST index since PostgreSQL 9.1! I quote the manual on the pg_trgm moule:

Beginning in PostgreSQL 9.1, these index types also support index searches for LIKE and ILIKE, for example


If you actually meant to use the % operator:

Have you tried lowering the threshold for the similarity operator % with set_limit():

SELECT set_limit(0.1);

or even lower? The default is 0.3. Just to see whether its the threshold that filters additional matches.

A solution for PostgreSQL 9.1:

-- Install the requisite extensions.
CREATE EXTENSION pg_trgm;
CREATE EXTENSION unaccent;

-- Function fixes STABLE vs. IMMUTABLE problem of the unaccent function.
CREATE OR REPLACE FUNCTION unaccent_text(text)
  RETURNS text AS
$BODY$
  -- unaccent is STABLE, but indexes must use IMMUTABLE functions.
  SELECT unaccent($1); 
$BODY$
  LANGUAGE sql IMMUTABLE
  COST 1;

-- Create an unaccented index.
CREATE INDEX the_table_label_unaccent_idx
ON the_table USING gin (lower(unaccent_text(label)) gin_trgm_ops);

-- Define the matching threshold.
SELECT set_limit(0.175);

-- Test the query (matching against the index expression).
SELECT
  label
FROM
  the_table
WHERE
  lower(unaccent_text(label)) % 'fil'
ORDER BY
  similarity(label, 'fil') DESC 

Returns "filbert", "fish fillet", and "filé powder".

Without calling SELECT set_limit(0.175);, you can use the double tilde (~~) operator:

-- Test the query (matching against the index expression).
SELECT
  label
FROM
  the_table
WHERE
  lower(unaccent_text(label)) ~~ 'fil'
ORDER BY
  similarity(label, 'fil') DESC 

Also returns "filbert", "fish fillet", and "filé powder".

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!