MultiFieldQueryParser is removing dots from the acronym

旧街凉风 提交于 2019-12-02 07:35:16
itsadok

As you mentioned, this is a dupe of this question. I suggest you at least add a link to it in your question. Also, I would urge you to create a user account, since right now it's not possible to look at your old question to get context.

The StandardAnalyzer specifically handles acronyms, and converts C.F.A. (for example) to cfa. This means you should be able to do the search, as long as you make sure you use the same analyzer for the indexing and for the query parsing.

I would suggest you run some more basic test cases to eliminate other factors. Try to user an ordinary QueryParser instead of a multi-field one.

Here's some code I wrote to play with the StandardAnalyzer:

StringReader testReader = new StringReader("C.F.A. C.F.A word");
StandardAnalyzer analyzer = new StandardAnalyzer();
TokenStream tokenStream = analyzer.tokenStream("title", testReader);
System.out.println(tokenStream.next());
System.out.println(tokenStream.next());
System.out.println(tokenStream.next());

The output for this, by the way was:

(cfa,0,6,type=<ACRONYM>)
(c.f.a,7,12,type=<HOST>)
(word,13,17,type=<ALPHANUM>)

Note, for example, that if the acronym doesn't end with a dot then the analyzer assumes it's an internet host name, so searching for "C.F.A" will not match "C.F.A." in the text.

(I'm only familiar with java lucene, but I imagine that it doesn't matter in this case.)

The purpose of the analyzers is to strip away characters and formatting that prevents effective full text search. For example, if you write a document where you only refer to lucene as "lucene.net", you'd probably want lucene to return search hits for only "lucene" as well. Therefore the StandardAnalyzer strips the dots (as well as some other special characters).

Don't worry though. As always with lucene this can be configured, in this case by choosing a different analyzer. Try using SimpleAnalyzer or KeywordAnalyzer instead, and see which one is closest to your desired behaviour. If neither of them will do, you can even implement your own custom analyzer using the analyzer interface. It's actually quite simple.

Good luck. :)

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!