Handling + as a special character in Lucene search

拟墨画扇 提交于 2020-01-12 18:35:49

问题


How do i make sure lucene gives me back relevant search results when my input string contains terms like c++? Lucene seems to ignore ++ characters.

Code details: When I execute this line,I get a blank search query.

queryField = multiFieldQueryParser.Parse(inpKeywords);

keywordsQuery.Add(queryField, BooleanClause.Occur.SHOULD);

And here is my custom analyzer:

public class CustomAnalyzer : Analyzer
    {
        private static readonly WhitespaceAnalyzer whitespaceAnalyzer = new WhitespaceAnalyzer();
    public override TokenStream TokenStream(String fieldName, System.IO.TextReader reader)
        {
            TokenStream result = whitespaceAnalyzer.TokenStream(fieldName, reader);
            result = new StandardTokenizer(reader);
            result = new LowerCaseFilter(result);
            result = new StopFilter(result, stop_words);
            return result;
        }
}

And I'm executing search query this way:

indexSearcher.Search(searchQuery, collector);

I did try queryField = multiFieldQueryParser.Parse(QueryParser.Escape(inpKeywords));,but it still does not work. Here is the query which get executed and returns zero hits. "+(())"

Thanks.


回答1:


Since, + is a special character, it needs to be escaped. The list of all characters that need to be escaped is here (See bottom of the page.)

You also need to be careful about the analyzer you use while indexing. For example, StandardAnalyzer will skip +. You may need to use something like WhiteSpaceAnalyzer while indexing and searching, which will preserve special characters in the tokenstream. Keep in mind that you need to use the same analyzer while indexing and searching.




回答2:


In addition to choosing the right analyzer, you can use QueryParser.Escape(string s) to ensure all special characters are properly escaped.

Because this is a static function, you can use it, even if you're using MultiFieldQueryParser.

For example, you can try something like this:

queryField = multiFieldQueryParser.Parse(QueryParser.Escape(inpKeywords));



回答3:


Try UTF-8 encoding your search queries.

You can enable this as described in this article



来源:https://stackoverflow.com/questions/1598465/handling-as-a-special-character-in-lucene-search

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!