Multiple synonym dictionary matches in PostgreSQL full text searching

时光毁灭记忆、已成空白 提交于 2020-01-13 11:20:07

问题


I am trying to do full text searching in PostgreSQL 8.3. It worked splendidly, so I added in synonym matching (e.g. 'bob' == 'robert') using a synonym dictionary. That works great too. But I've noticed that it apparently only allows a word to have one synonym. That is, 'al' cannot be 'albert' and 'allen'.

Is this correct? Is there any way to have multiple dictionary matches in a PostgreSQL synonym dictionary?

For reference, here is my sample dictionary file:

bob    robert
bobby  robert
al     alan
al     albert
al     allen

And the SQL that creates the full text search config:

CREATE TEXT SEARCH DICTIONARY nickname (TEMPLATE = synonym, SYNONYMS = nickname);
CREATE TEXT SEARCH CONFIGURATION dxp_name (COPY = simple);
ALTER TEXT SEARCH CONFIGURATION dxp_name ALTER MAPPING FOR asciiword WITH nickname, simple;

What am I doing wrong? Thanks!


回答1:


That's a limitation in how the synonyms work. What you can do is turn it around as in:

bob    robert
bobby  robert
alan   al
albert al
allen  al

It should give the same end result, which is that a search for either one of those will match the same thing.




回答2:


A dictionary must define a functional relationship between words and lexemes otherwise it won't know which word to return when you lexize. In your example, al maps to three different values thus defining a multi-valued function and the lexize function doesn't know what to return. As Magnus shows, you can lexize from the proper names alan, albert, allen to the nickname al.

Remember however, that the point of an FTS dictionary is not to perform transformations per se but to allow efficient indexing on semantically relevant words. This means that the lexeme need not resemble the original entry in any linguistic sense. Although you're right that a many-to-many relationship is impossible to define, do you really need to? For example, to resolve your vin example:

vin        vin
vincent    vin
vincenzo   vin
vinnie     vin

but you could also do this:

vin        grob
vincent    grob
vincenzo   grob
vinnie     grob

and get the same effect (although why you'd want to is another story).

Thus if you were to parse a document with say 11 versions of the name Vincent then the to_tsvector function would return vin:11 in the former case and grob:11 in the latter.




回答3:


In the 8.4 documentation, it talks about a replacement synonym dictionary, maybe that will be helpful?

http://www.postgresql.org/docs/8.4/interactive/dict-xsyn.html



来源:https://stackoverflow.com/questions/1208927/multiple-synonym-dictionary-matches-in-postgresql-full-text-searching

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!