问题
I have a fulltext search table. It generally works well. But, in some cases it failed.
For example; when I search for ' "red*" NEAR "color*" '
it works properly.
However, if I search for '"the*" NEAR "red*"'
it fails. It didn't work for any terms which starts with the
-- working case
SELECT *
FROM MyTable
WHERE CONTAINS(MyColumn, ' "red*" NEAR "color*" ')
-- failed case
SELECT *
FROM MyTable
WHERE CONTAINS(MyColumn, ' "the*" NEAR "red*" ')
Does anyone know why?
回答1:
What you are experiencing is the concept of stopwords (aka noise words) in full text search. Most full text search engines have a list of very common words that are being ingored in the search, because they are not specific enough to be considered relevant.
In SQL Server you can display the list of configured stopwords for English language with this query (and I bet that 'the'
is part of that list):
select * from sys.fulltext_stopwords where language_id = 1033
You can manage the stop words, by creating a custom list.
It is also possible to disable stopwords, although I would not recommend that:
alter fulltext index on mytable set stoplist = off
回答2:
"The", in a full text index, is a stopword (or "noise word"). This means that the word will not be indexed and nor will it be searchable using CONTAINS
. This can be seen discussed at the very start of the documentation Configure and Manage Stopwords and Stoplists for Full-Text Search:
To prevent a full-textindex from becoming bloated, SQL Server has a mechanism that discards commonly occurring strings that do not help the search. These discarded strings are called stopwords. During index creation, the Full-Text Engine omits stopwords from the full-text index. This means that full-text queries will not search on stopwords.
Stopwords. A stopword can be a word with meaning in a specific language. For example, in the English language, words such as "a," "and," "is," and "the" are left out of the full-text index since they are known to be useless to a search. A stopword can also be a token that does not have linguistic meaning.
Emphasis added.
If the answer is to be trusted, you could remove the stoplist from your full text index, and then create one, as discussed on this answer on DBA, which I example on:
ALTER FULLTEXT INDEX ON dbo.MyTable SET STOPLIST = OFF;
CREATE FULLTEXT STOPLIST NoTheStopList;
ALTER FULLTEXT STOPLIST NoTheStopList ADD 'are' LANGUAGE 'British';
ALTER FULLTEXT STOPLIST NoTheStopList ADD 'a' LANGUAGE 'British';
ALTER FULLTEXT STOPLIST NoTheStopList ADD 'is' LANGUAGE 'British';
ALTER FULLTEXT STOPLIST NoTheStopList ADD 'and' LANGUAGE 'British';
...
ALTER FULLTEXT INDEX ON dbo.MyTable SET STOPLIST = NoTheStopList;
来源:https://stackoverflow.com/questions/59271411/sql-server-fulltext-contains-not-working-in-some-words