Why is this Lucene query a “contains” instead of a “startsWith”?

为君一笑 提交于 2019-12-07 04:51:18

问题


string q = "m";
Query query = new QueryParser("company", new StandardAnalyzer()).Parse(q+"*");

will result in query being a prefixQuery :company:a*

Still I will get results like "Fleet Africa" where it is rather obvious that the A is not at the start and thus gives me undesired results.

Query query = new TermQuery(new Term("company", q+"*"));

will result in query being a termQuery :company:a* and not returning any results. Probably because it interprets the query as an exact match and none of my values are the "a*" literal.

Query query = new WildcardQuery(new Term("company", q+"*"));

will return the same results as the prefixquery;

What am I doing wrong?


回答1:


StandardAnalyzer will tokenize "Fleet Africa" into "fleet" and "africa". Your a* search will match the later term.

If you want to consider "Fleet Africa" as one single term, use an analyzer that does not break up your string on whitespaces. KeywordAnalyzer is an example, but you may still want to lowercase your data so queries are case insensitive.




回答2:


The short answer: all your queries do not constrain the search to the start of the field. You need an EdgeNGramTokenFilter or something like it. See this question for an implementation of autocomplete in Lucene.



来源:https://stackoverflow.com/questions/605899/why-is-this-lucene-query-a-contains-instead-of-a-startswith

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!