Users can type in a name and the system should match the text, even if the either the user input or the database field contains accented (UTF-8) charact
You are not using the operator class provided by the pg_trgm
module. I would create an index like this:
CREATE INDEX label_Lower_unaccent_trgm_idx ON test_trgm USING gist (lower(unaccent_text(label)) gist_trgm_ops);
Originally, I had a GIN index here, but I later learned that a GiST is probably even better suited for this kind of query because it can return values sorted by similarity. More details:
Your query has to match the index expression to be able to make use of it.
SELECT label
FROM the_table
WHERE lower(unaccent_text(label)) % 'fil'
ORDER BY similarity(label, 'fil') DESC -- it's ok to use original string here
However, "filbert" and "filé powder" are not actually very similar to "fil" according to the % operator. I suspect what you really want is this:
SELECT label FROM the_table WHERE lower(unaccent_text(label)) ~~ '%fil%' ORDER BY similarity(label, 'fil') DESC -- it's ok to use original string here
This will find all strings containing the search string, and sort the best matches according to the %
operator first.
And the juicy part: the expression can use a GIN or GiST index since PostgreSQL 9.1! I quote the manual on the pg_trgm moule:
Beginning in PostgreSQL 9.1, these index types also support index searches for LIKE and ILIKE, for example
If you actually meant to use the %
operator:
Have you tried lowering the threshold for the similarity operator %
with set_limit():
SELECT set_limit(0.1);
or even lower? The default is 0.3. Just to see whether its the threshold that filters additional matches.
A solution for PostgreSQL 9.1:
-- Install the requisite extensions.
CREATE EXTENSION pg_trgm;
CREATE EXTENSION unaccent;
-- Function fixes STABLE vs. IMMUTABLE problem of the unaccent function.
CREATE OR REPLACE FUNCTION unaccent_text(text)
RETURNS text AS
$BODY$
-- unaccent is STABLE, but indexes must use IMMUTABLE functions.
SELECT unaccent($1);
$BODY$
LANGUAGE sql IMMUTABLE
COST 1;
-- Create an unaccented index.
CREATE INDEX the_table_label_unaccent_idx
ON the_table USING gin (lower(unaccent_text(label)) gin_trgm_ops);
-- Define the matching threshold.
SELECT set_limit(0.175);
-- Test the query (matching against the index expression).
SELECT
label
FROM
the_table
WHERE
lower(unaccent_text(label)) % 'fil'
ORDER BY
similarity(label, 'fil') DESC
Returns "filbert", "fish fillet", and "filé powder".
Without calling SELECT set_limit(0.175);
, you can use the double tilde (~~
) operator:
-- Test the query (matching against the index expression).
SELECT
label
FROM
the_table
WHERE
lower(unaccent_text(label)) ~~ 'fil'
ORDER BY
similarity(label, 'fil') DESC
Also returns "filbert", "fish fillet", and "filé powder".