Noise Words in Sql Server 2005 Full Text Search

后端 未结 2 1053
栀梦
栀梦 2021-01-03 08:27

I am attempting to use a full text search over a series of names in my database. This is my first attempt at using full text search. Currently I take the search string enter

相关标签:
2条回答
  • 2021-01-03 08:55

    Full Text is going to work off of the search criteria you provide it. You can remove the noise word from the file, but you really risk bloating your index size by doing that. Robert Cain has a lot of good information on his blog regarding this:

    http://arcanecode.com/2008/05/29/creating-and-customizing-noise-words-in-sql-server-2005-full-text-search/

    To save some time you can look at how this method removes them and copy the code and words:

            public string PrepSearchString(string sOriginalQuery)
        {
            string strNoiseWords = @" 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | $ | ! | @ | # | $ | % | ^ | & | * | ( | ) | - | _ | + | = | [ | ] | { | } | about | after | all | also | an | and | another | any | are | as | at | be | because | been | before | being | between | both | but | by | came | can | come | could | did | do | does | each | else | for | from | get | got | has | had | he | have | her | here | him | himself | his | how | if | in | into | is | it | its | just | like | make | many | me | might | more | most | much | must | my | never | now | of | on | only | or | other | our | out | over | re | said | same | see | should | since | so | some | still | such | take | than | that | the | their | them | then | there | these | they | this | those | through | to | too | under | up | use | very | want | was | way | we | well | were | what | when | where | which | while | who | will | with | would | you | your | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z ";
    
            string[] arrNoiseWord = strNoiseWords.Split("|".ToCharArray());
    
            foreach (string noiseword in arrNoiseWord)
            {
                sOriginalQuery = sOriginalQuery.Replace(noiseword, " ");
            }
            sOriginalQuery = sOriginalQuery.Replace("  ", " ");
            return sOriginalQuery.Trim();
        }
    

    however, I would probably go with a Regex.Replace for this which should be much faster than looping. I just don't have a quick example to post.

    0 讨论(0)
  • 2021-01-03 08:57

    Here's a working function. The file noiseENU.txt is copied as-is from \Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\FTData .

        Public Function StripNoiseWords(ByVal s As String) As String
            Dim NoiseWords As String = ReadFile("/Standard/Core/Config/noiseENU.txt").Trim
            Dim NoiseWordsRegex As String = Regex.Replace(NoiseWords, "\s+", "|") ' about|after|all|also etc.
            NoiseWordsRegex = String.Format("\s?\b(?:{0})\b\s?", NoiseWordsRegex)
            Dim Result As String = Regex.Replace(s, NoiseWordsRegex, " ", RegexOptions.IgnoreCase) ' replace each noise word with a space
            Result = Regex.Replace(Result, "\s+", " ") ' eliminate any multiple spaces
            Return Result
        End Function
    
    0 讨论(0)
提交回复
热议问题