问题
I use PostgreSQL 11.8. And for Postgres, I use the docker image postgres:11-alpine
. I want to create a custom full text search dictionary for expressions which are based on some words, like hello world
should become hw
.
First of all I have a custom full text search configuration my_swedish
:
CREATE TEXT SEARCH CONFIGURATION my_swedish (
COPY = swedish
);
ALTER TEXT SEARCH CONFIGURATION my_swedish
DROP MAPPING FOR hword_asciipart;
ALTER TEXT SEARCH CONFIGURATION my_swedish
DROP MAPPING FOR hword_part;
and for this configuration I want to create and use a dictionary. For that I follow the PostgreSQL manual:
CREATE TEXT SEARCH DICTIONARY thesaurus_my_swedish (
TEMPLATE = thesaurus,
DictFile = thesaurus_my_swedish,
Dictionary = pg_catalog.swedish_stem
);
and am faced with
ERROR: could not open thesaurus file "/usr/local/share/postgresql/tsearch_data/thesaurus_my_swedish.ths": No such file or directory
I then created the file manually:
touch /usr/local/share/postgresql/tsearch_data/thesaurus_astro.ths
then:
ALTER TEXT SEARCH CONFIGURATION my_swedish
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart
WITH thesaurus_my_swedish;
ERROR: text search configuration "my_swedish" does not exist
When I changed it to default swedish
ALTER TEXT SEARCH CONFIGURATION swedish
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart
WITH thesaurus_my_swedish;
I got the error:
ERROR: text search dictionary "thesaurus_my_swedish" does not exist
How to correctly create a thesaurus dictionary for my custom test search configuration?
UPDATE
I added in my file thesaurus_my_swedish.ths
data hello world : hw
And now
SELECT to_tsvector('my_swedish', 'hello world');
returned 'hw':1
,
But what about about othr words ? Because to_tsvector('my_swedish', 'hello test')
return empty, it should be returned like default swedish
SELECT to_tsvector('swedish', 'hello test');
'hello':1 'test':2
what is wrong ?
UPDATE
I understand, need to add pg_catalog.swedish_stem
too
ALTER TEXT SEARCH CONFIGURATION my_swedish
ALTER MAPPING FOR asciihword, asciiword, hword, word
WITH thesaurus_my_swedish, pg_catalog.swedish_stem;
回答1:
You did everything right, with a few exceptions:
thesaurus_my_swedish.ths
should not be empty, but contain rules like this (taken from your example):hello world : hw
You should use the new dictionary for all token types that now use
swedish_stem
, that isALTER TEXT SEARCH CONFIGURATION my_swedish ALTER MAPPING FOR asciihword, asciiword, hword, word WITH thesaurus_my_swedish, swedish_stem;
This error is mysterious and should not have happened:
ERROR: text search configuration "my_swedish" does not exist
Perhaps you connected to the wrong database, or you dropped the configuration again, or it is not on the search_path
and you have to qualify it with its schema. Use \dF *.*
in psql
to list all existing configurations.
Of course you have to create the dictionary before you can use it in a text search configuration.
Don't modify the configurations in pg_catalog
, such modifications would be lost after an upgrade.
来源:https://stackoverflow.com/questions/62652659/how-to-correctly-create-thesaurus-dictionary-for-my-custom-text-search-configura