SQL Server Full-Text Search for exact match with fallback

大兔子大兔子 提交于 2019-12-01 15:18:06

You should use full text search CONTAINSTABLE to find the top 100 (possibly 200) candidate results and then order the results you found using your own criteria.

It sounds like you'd like to ORDER BY

  1. exact match of the phrase (=)
  2. the fully matched phrase (LIKE)
  3. higher value for the Popularity column
  4. the Rank from the CONTAINSTABLE

But you can toy around with the exact order you prefer.

In SQL that looks something like:

DECLARE @title varchar(255)
SET @title = '"Toy Story"'
--need to remove quotes from parameter for LIKE search
DECLARE @title2 varchar(255)
SET @title2 = REPLACE(@title, '"', '')

SELECT
    m.ID,
    m.title,
    m.Popularity,
    k.Rank
FROM Movies m
INNER JOIN CONTAINSTABLE(Movies, title, @title, 100) as [k]
    ON m.ID = k.[Key]
ORDER BY 
  CASE WHEN m.title = @title2 THEN 0 ELSE 1 END,
  CASE WHEN m.title LIKE @title2 THEN 0 ELSE 1 END,
  m.popularity desc,
  k.rank

See SQLFiddle

This will give you the movies that contain the exact phrase "Toy Story", ordered by their popularity.

SELECT
    m.[ID],
    m.[Popularity],
    k.[Rank]
FROM [dbo].[Movies] m
INNER JOIN CONTAINSTABLE([dbo].[Movies], [Title], N'"Toy Story"') as [k]
    ON m.[ID] = k.[Key]
ORDER BY m.[Popularity]

Note the above would also give you "The Goonies Return" if you searched "The Goonies".

If got the feeling you don't really like the fuzzy part of the full text search but you do like the performance part.

Maybe is this a path: if you insist on getting the EXACT match before a weighted match you could try to hash the value. For example 'Toy Story' -> bring to lowercase -> toy story -> Hash into 4de2gs5sa (with whatever hash you like) and perform a search on the hash.

In Oracle I've used UTL_MATCH for similar purposes. (http://docs.oracle.com/cd/E11882_01/appdev.112/e25788/u_match.htm)

Even though using the Jaro Winkler algorithm, for instance, might take awhile if you compare the title column from table 1 and table 2, you can improve performance if you partially join the 2 tables. I have in some cases compared person names on table 1 with table 2 using Jaro Winkler, but limited results not just above a certain Jaro Winkler threshold, but also to names between the 2 tables where the first letter is the same. For instance I would compare Albert with Aden, Alfonzo, and Alberto, using Jaro Winkler, but not Albert and Frank (limiting the number of situations where the algorithm needs to be used).

Jaro Winkler may actually be suitable for movie titles as well. Although you are using SQL server (can't use the utl_match package) it looks like there is a free library called "SimMetrics" which has the Jaro Winkler algorithm among other string comparison metrics. You can find detail on that and instructions here: http://anastasiosyal.com/POST/2009/01/11/18.ASPX?#simmetrics

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!