What is the SQL used to do a search similar to “Related Questions” on Stackoverflow

自闭症网瘾萝莉.ら 提交于 2019-12-18 12:34:44

问题


I am trying to implement a feature similar to the "Related Questions" on Stackoverflow.

How do I go about writing the SQL statement that will search the Title and Summary field of my database for similar questions?

If my questions is: "What is the SQL used to do a search similar to "Related Questions" on Stackoverflow".

Steps that I can think of are;

  1. Strip the quotation marks
  2. Split the sentence into an array of words and run a SQL search on each word.

If I do it this way, I am guessing that I wouldn't get any meaningful results. I am not sure if Full Text Search is enabled on the server, so I am not using that. Will there be an advantage of using Full Text Search?

I found a similar question but there was no answer: similar question

Using SQL 2005


回答1:


Check out this podcast.

One of our major performance optimizations for the “related questions” query is removing the top 10,000 most common English dictionary words (as determined by Google search) before submitting the query to the SQL Server 2008 full text engine. It’s shocking how little is left of most posts once you remove the top 10k English dictionary words. This helps limit and narrow the returned results, which makes the query dramatically faster.




回答2:


They probably relate based on tags that are added to the questions...




回答3:


After enabling Full Text search on my SQL 2005 server, I am using the following stored procedure to search for text.

ALTER PROCEDURE [dbo].[GetSimilarIssues] 
(
 @InputSearch varchar(255)
)
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;

DECLARE @SearchText varchar(500);

SELECT @SearchText = '"' + @InputSearch + '*"'

SELECT  PostId, Summary, [Description], 
Created
FROM Issue

WHERE FREETEXT (Summary, @SearchText);
END



回答4:


I'm pretty sure it would be most efficient to implement the feature based on the tags associated with each post.




回答5:


It's probably done using a full text search which matches like words/phrases. I've used it in MySQL and SQL Server with decent success with out of the box functionality.

You can find more on MySQL full text searches at:

http://dev.mysql.com/doc/refman/5.1/en/fulltext-search.html

Or just google Full Text search and you will find a lot of information.




回答6:


It looks keyword based on the title you enter, queried against titles and content of other questions. Probably easier (and more appropriate) to do in Lucene (or similar) then in a relational database.




回答7:


I'd say it's probably a full text search on the question title and the question content and answers as well using the individual words (not the whole title) you enter. Then, using the ranking features of full-text, the top 10 or so questions that rank the highest are displayed.

As tydok pointed out, it looks like they are using full-text searching (I couldn't imagine any other way).

Here's the MSDN reference on Full-Text Searching, nailing the specific query used probably isn't going to happen.




回答8:


The SQL very well may be just "SELECT * FROM questions;". I find it hard to imagine that the algorithm for finding similar questions is implemented in SQL.



来源:https://stackoverflow.com/questions/937059/what-is-the-sql-used-to-do-a-search-similar-to-related-questions-on-stackoverf

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!