SQL Server Weighted Full Text Search

删除回忆录丶 提交于 2021-01-20 16:51:28

问题


Currently I have a table that I search upon 4 fields, FirstName, LastName, MiddleName, And AKA's. I currently have a CONTAINSTABLE search for the rows and it works. Not well but it works. Now I want to make the First Name weighted higher and middle name lower.

I found the command ISABOUT but that seems pretty worthless if I have to do it by word not column (hopefully I understood this wrong). This is not an option if its by word because I do not know how many words the user will enter.

I found the thread here that talks about this same solution however I was unable to get the accepted solution to work. Maybe I have done something wrong but regardless I cannot get it to work, and its logic seems really... odd. There has to be an easier way.


回答1:


The key to manipulating the rankings is to use a union. For each column you use a separate select statement. In that statement, add an identifier that shows from which column each row was pulled then. Insert the results into a table variable, then you can manipulate the ranking by sorting on the identifier or multiplying the rank by some value based on the identifier.

The key is to give the appearance of modifying the ranking, not to actually change sql server's ranking.

Example using a table variable:

DECLARE @Results TABLE (PersonId Int, Rank Int, Source Int)

For table People with Columns PersonId Int PK Identity, FirstName VarChar(100), MiddleName VarChar(100), LastName VarChar(100), AlsoKnown VarChar(100) with each column added to a full text catalog, you could use the query:

INSERT INTO @Results (PersonId, Rank, Source)

SELECT PersonId, Rank, 1
FROM ContainsTable(People, FirstName, @SearchValue) CT INNER JOIN People P ON CT.Key = P.PersonId

UNION
SELECT PersonId, Rank, 2
FROM ContainsTable(People, MiddleName, @SearchValue) CT INNER JOIN People P ON CT.Key = P.PersonId

UNION
SELECT PersonId, Rank, 3
FROM ContainsTable(People, LastName, @SearchValue) CT INNER JOIN People P ON CT.Key = P.PersonId

UNION
SELECT PersonId, Rank, 4
FROM ContainsTable(People, AlsoKnown, @SearchValue) CT INNER JOIN People P ON CT.Key = P.PersonId

/*
Now that the results from above are in the @Results table, you can manipulate the
rankings in one of several ways, the simplest is to pull the results ordered first by Source then by Rank.  Of course you would probably join to the People table to pull the name fields.
*/

SELECT PersonId
FROM @Results
ORDER BY Source, Rank DESC

/*
A more complex manipulation would use a statement to multiply the ranking by a value above 1 (to increase rank) or less than 1 (to lower rank), then return results based on the new rank.  This provides more fine tuning, since I could make first name 10% higher and middle name 15% lower and leave last name and also known the original value.
*/

SELECT PersonId, CASE Source WHEN 1 THEN Rank * 1.1 WHEN 2 THEN Rank * .9 ELSE Rank END AS NewRank FROM @Results
ORDER BY NewRank DESC

The one downside is you'll notice I didn't use UNION ALL, so if a word appears in more than one column, the rank won't reflect that. If that's an issue you could use UNION ALL and then remove duplicate person id's by adding all or part of the duplicate record's rank to the rank of another record with the same person id.




回答2:


Ranks are useless across indexes, you can't merge them and expect the result to mean anything. The rank numbers of each index are apple/orange/grape/watermelon/pair comparisions that have no relative meaning WRT contents of other indexes.

Sure you can try and link/weight/order ranks between indexes to try and fudge a meaningful result but at the end of the day that result is still gibberish however possibly still good enough to provide a workable solution depending on the specifics of your situation.

In my view the best solution is to put all data you intend to be searchable in a single FTS index/column and use that columns rank to order your output.. Even if you have to duplicate field contents to accomplish the result.




回答3:


Just few weeks ago I was solving very similar problem and solution is suprisingly easy (albeit ugly and space consuming). Create another column containing combined values of FirstName + FirstName + LastName + MiddleName in this order. Duplicate FirstName column is not a typo, it's a trick to force FT to weight values from FirstName higher during search.




回答4:


I assume the data returned is joined to other tables within your schema? I would develop your own RANK based on columns from associated data to the full text index. This also provides a guaranteed level of accuracy in the RANK value.




回答5:


How about this way:

    SELECT p.* from Person p
left join ContainsTable(Person, FirstName, @SearchValue) firstnamefilter on firstnamefiler.key = p.id
left join ContainsTable(Person, MiddleName, @SearchValue) middlenamefilter on middlenamefilter.key = p.id
where (firstnamefilter.rank is not null or middlenamefilter.rank is not null)
order by firstnamefilter.rank desc, middlenamefilter.rank desc

This will produce a record for each Person record where either the first or middle names (or both) match on the search term, and order by all matches against the first name first (in descending rank order), followed by all matches against the middle name (again in descending rank order)



来源:https://stackoverflow.com/questions/310425/sql-server-weighted-full-text-search

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!