Match a phrase ending in a prefix with full text search

我怕爱的太早我们不能终老 提交于 2019-12-18 13:24:59

问题


I'm looking for a way to emulate something like SELECT * FROM table WHERE attr LIKE '%text%' using a tsvector in PostgreSQL.

I've created a tsvector attribute without using a dictionary. Now, a query like ...

SELECT title
FROM table
WHERE title_tsv @@ plainto_tsquery('ph:*');  

... would return all titles like 'Physics', 'PHP', etc. But how can I create a query that returns all records where the title start with 'Zend Fram' (which should return for instance 'Zend Framework')?

Of course, I could use something like:

SELECT title
FROM table
WHERE title_tsv @@ to_tsquery('zend')
AND   title_tsv @@ to_tsquery('fram:*');

However, this seems a little awkward.

So, the question is: is there a way to formulate the query given above using something like:

SELECT title
FROM table
WHERE title_tsv @@ to_tsquery('zend fram:*');

回答1:


SELECT title
FROM table
WHERE title_tsv @@ to_tsquery('zend') and
title_tsv @@ to_tsquery('fram:*')  

is equivalent to:

SELECT title
FROM table
WHERE title_tsv @@ to_tsquery('zend & fram:*')

but of course that finds "Zend has no framework" as well.

You could of course express a regular expression match against title after the tsquery match, but you would have to use explain analyze to make sure that was being executed after the tsquery instead of before.




回答2:


Postgres 9.6 introduces phrase search capabilities for full text search. So this works now:

SELECT title
FROM  tbl
WHERE title_tsv @@ to_tsquery('zend <-> fram:*');

<-> being the FOLLOWED BY operator.

It finds 'foo Zend framework bar' or 'Zend frames', but not 'foo Zend has no framework bar'.

Quoting the release notes for Postgres 9.6:

A phrase-search query can be specified in tsquery input using the new operators <-> and <N>. The former means that the lexemes before and after it must appear adjacent to each other in that order. The latter means they must be exactly N lexemes apart.

For best performance support the query with a GIN index:

CREATE INDEX tbl_title_tsv_idx ON tbl USING GIN (title_tsv);

Or don't store title_tsv in the table at all (bloating it and complicating writes). You can use an expression index instead:

CREATE INDEX tbl_title_tsv_idx ON tbl USING GIN (to_tsvector('english', title));

You need to specify the text search configuration (often language-specific) to make the expression immutable. And adapt the query accordingly:

...
WHERE to_tsvector('english', title) @@ to_tsquery('english', 'zend <-> fram:*');



回答3:


Not a pretty solution, but it should do the job:

psql=# SELECT regexp_replace(cast(plainto_tsquery('Zend Fram') as text), E'(\'\\w+\')', E'\\1:*', 'g') ;
   regexp_replace    
---------------------
 'zend':* & 'fram':*
(1 row)

It can be used like:

psql=# SELECT title FROM table WHERE title_tsv(title) @@ to_tsquery(regexp_replace(cast(plainto_tsquery('Zend Fram') as text), E'(\'\\w+\')', E'\\1:*', 'g'));

How this works:

  1. casts the plain tsquery to a string: cast(plainto_tsquery('Zend Fram') as text)
  2. uses regex to append the :* prefix matcher to each search term: regexp_replace(..., E'(\'\\w+\')', E'\\1:*', 'g')
  3. converts it back to a non-plain tsquery. to_tsquery(...)
  4. and uses it in the search expression SELECT title FROM table WHERE title_tsv(title) @@ ...



回答4:


There's a way to do it in Postgres using trigrams and Gin/Gist indexes. There's a simple example, but with some rough edges, in this article by Kristo Kaiv: Substring Search.



来源:https://stackoverflow.com/questions/6155592/match-a-phrase-ending-in-a-prefix-with-full-text-search

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!