lucene wildcard query with space

自古美人都是妖i 提交于 2020-04-10 09:16:08

问题


I have Lucene index which has city names. Consider I want to search for 'New Delhi'. I have string 'New Del' which I want to pass to Lucene searcher and I am expecting output as 'New Delhi'. If I generate query like Name:New Del* It will give me all cities with 'New and Del'in it. Is there any way by which I can create Lucene query wildcard query with spaces in it? I referred and tried few solutions given @ http://www.gossamer-threads.com/lists/lucene/java-user/5487


回答1:


It sounds like you have indexed your city names with analysis. That will tend to make this more difficult. With analysis, "new" and "delhi" are separate terms, and must be treated as such. Searching over multiple terms with wildcards like this tends to be a bit more difficult.

The easiest solution would be to index your city names without tokenization (lowercasing might not be a bad idea though). Then you would be able to search with the query parser simply by escaping the space:

QueryParser parser = new QueryParser("defaultField", analyzer);
Query query = parser.parse("cityname:new\\ del*");

Or you could use a simple WildcardQuery:

Query query = new WildcardQuery(new Term("cityname", "new del*"));

With the field analyzed by standard analyzer:

You will need to rely on SpanQueries, something like this:

SpanQuery queryPart1 = new SpanTermQuery(new Term("cityname", "new"));
SpanQuery queryPart2 = new SpanMultiTermQueryWrapper(new WildcardQuery(new Term("cityname", "del*")));
Query query = new SpanNearQuery(new SpanQuery[] {query1, query2}, 0, true);

Or, you can use the surround query parser (which provides query syntax intended to provide more robust support of span queries), using a query like W(new, del*):

org.apache.lucene.queryparser.surround.parser.QueryParser surroundparser = new org.apache.lucene.queryparser.surround.parser.QueryParser();
SrndQuery srndquery = surroundparser.parse("W(new, del*)");
query = srndquery.makeLuceneQueryField("cityname", new BasicQueryFactory());



回答2:


As I learnt from the thread mentioned by you (http://www.gossamer-threads.com/lists/lucene/java-user/5487), you can either do an exact match with space or treat either parts w/ wild card.

So something like this should work - [New* Del*]



来源:https://stackoverflow.com/questions/34529261/lucene-wildcard-query-with-space

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!