SQL query to match keywords?

廉价感情. 提交于 2019-12-19 04:53:14

问题


I have a table with a column as nvarchar(max) with text extracted from word documents in it. How can I create a select query that I'll pass another a list of keywords as parameter and return the rows ordered by the number of matches?

Maybe it is possible with full text search?


回答1:


Yes, possible with full text search, and likely the best answer. For a straight T-SQL solution, you could use a split function and join, e.g. assuming a table of numbers called dbo.Numbers (you may need to decide on a different upper limit):

SET NOCOUNT ON;
DECLARE @UpperLimit INT;
SET @UpperLimit = 200000;

WITH n AS
(
    SELECT
        rn = ROW_NUMBER() OVER
        (ORDER BY s1.[object_id])
    FROM sys.objects AS s1
    CROSS JOIN sys.objects AS s2
    CROSS JOIN sys.objects AS s3
)
SELECT [Number] = rn - 1
INTO dbo.Numbers
FROM n
WHERE rn <= @UpperLimit + 1;

CREATE UNIQUE CLUSTERED INDEX n ON dbo.Numbers([Number]);

And a splitting function that uses that table of numbers:

CREATE FUNCTION dbo.SplitStrings
(
    @List NVARCHAR(MAX)
)
RETURNS TABLE
AS
    RETURN
    (
        SELECT DISTINCT
            [Value] = LTRIM(RTRIM(
                SUBSTRING(@List, [Number],
                CHARINDEX(N',', @List + N',', [Number]) - [Number])))
        FROM
            dbo.Numbers
        WHERE
            Number <= LEN(@List)
            AND SUBSTRING(N',' + @List, [Number], 1) = N','
    );
GO

Then you can simply say:

SELECT key, NvarcharColumn /*, other cols */
FROM dbo.table AS outerT
WHERE EXISTS
(
  SELECT 1 
    FROM dbo.table AS t 
    INNER JOIN dbo.SplitStrings(N'list,of,words') AS s
    ON t.NvarcharColumn LIKE '%' + s.Item + '%'
    WHERE t.key = outerT.key
);

As a procedure:

CREATE PROCEDURE dbo.Search
    @List NVARCHAR(MAX)
AS
BEGIN
    SET NOCOUNT ON;

    SELECT key, NvarcharColumn /*, other cols */
    FROM dbo.table AS outerT
    WHERE EXISTS
    (
      SELECT 1 
        FROM dbo.table AS t 
        INNER JOIN dbo.SplitStrings(@List) AS s
        ON t.NvarcharColumn LIKE '%' + s.Item + '%'
        WHERE t.key = outerT.key
    );
END
GO

Then you can just pass in @List (e.g. EXEC dbo.Search @List = N'foo,bar,splunge') from C#.

This won't be super fast, but I'm sure it will be quicker than pulling all the data out into C# and double-nested loop it manually.




回答2:


how to ... return the rows ordered by the number of [full-text] matches

I have not used it myself but believe SQL Server 2008 supports weighting the CONTAINSTABLE matches which might be of help to you:

            http://msdn.microsoft.com/en-us/library/ms189760.aspx

If you don't have an engine that returns results weighted by the number of hits ...

You could write a UDF that takes two inputs and returns an integer: the big textvalue is the first input and the words you're looking for as a comma-delimited string is the second. The function returns an integer representing either the number of distinct looked-for words that were actually found at least once in the text, or the total number of times the looked-for words were found. Implementation --how to weight-- is up to you. Maybe, for example, you'd want to arrange the looked-for words in most-important to least-important order, and give an important word hit more weight than a less important word hit.

You could then use your full text search engine to find all records that contain at least one of the words (you'd OR them), and you'd run this result set through your UDF scalar function:

             pseudo code

            select title,  weightfunction(summary, 'word1,word2,word3....wordN')
            from docs
            where summary contains ( word1 or word2 or word3 ... or wordN)
            order by weightfunction(summary, 'word1,word2,word3....wordN') desc


来源:https://stackoverflow.com/questions/7274514/sql-query-to-match-keywords

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!