Finding similar posts with PostgreSQL

▼魔方 西西 提交于 2019-12-24 02:55:15

问题


I have a table posts:

CREATE TABLE posts (
  id serial primary key,
  content text
);

When a user submits a post, how can I compare his post with the others and find similar posts?
I'm looking for something like StackOverflow does with the "Similar Questions".


回答1:


While Text Search is an option it is not meant for this type of search primarily. The typical use case would be to find words in a document based on dictionaries and stemming, not to compare whole documents.

I am sure StackOverflow has put some smarts into the similarity search, as this is not a trivial matter.

You can get halfway decent results with the similarity function and operators provided by the pg_trgm module:

SELECT content, similarity(content, 'grand new title asking foo') AS sim_score
FROM   posts
WHERE  content  % 'grand new title asking foo'
ORDER  BY 2 DESC, content;

Be sure to have a GiST index on content for this.

But you'll probably have to do more. You could combine it with Text Search after identifying keywords in the new content ..




回答2:


You need to use Full Text Search in Postgres.

http://www.postgresql.org/docs/9.1/static/textsearch-intro.html



来源:https://stackoverflow.com/questions/17842196/finding-similar-posts-with-postgresql

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!