Lucene phrase match with a wildcard at the end

本秂侑毒 提交于 2020-01-05 12:11:32

问题


I'm trying to make a predictive text search that allows a user to start typing, and results from their text come through as results

E.g. with "ca" they can get "cat in the hat", "my calculus is cool" "cat dog mouse"

However, if a person continues to type with spaces, I want the whole phrase to be considered as one term

E.g. "cat i" should find "cat in the hat"

but NOT "[cat] dog mouse" nor "my calculus [i]s cool"

This is my current code, however it does not seem to be working as I'd hoped:

val mySort = new Sort(SortField.FIELD_SCORE, new SortField("popularity", SortField.Type.INT, true))
val analyzer = new StandardAnalyzer(Version.LUCENE_43)

val parser: QueryParser = new QueryParser(Version.LUCENE_43, "title", analyzer)
val query = parser.parse(queryString+"*")
val titleQuery = new ConstantScoreQuery(query)
titleQuery.setBoost(2)

val synopsisQuery = new QueryParser(Version.LUCENE_43, "synopsis", analyzer).parse(queryString)
val summaryQuery = new ConstantScoreQuery(synopsisQuery)

val finalQuery = new DisjunctionMaxQuery(0)
finalQuery.add(titleQuery)
finalQuery.add(summaryQuery)

val collector = TopFieldCollector.create(mySort,Limit,false,true,true,false)

searcher.search(finalQuery, collector)

collector.topDocs().scoreDocs

回答1:


There are basically two ways to achieve this.

The old way, is to construct a MultiPhraseQuery manually - see this answer for details.

The new way is simpler though: construct a SpanNearQuery. Use the following parameters: inOrder = true and slop = 0 to get an equivalent of PhraseQuery.

Each clause in the SpanNearQuery should be a SpanTermQuery except the last one. These should be the full terms contained in your phrase.

The last clause should be a SpanMultiTermQueryWrapper<PrefixQuery>, wrapping a PrefixQuery. Use the last term of your phrase as the prefix value.

To sum up, for cat i:

SpanNearQuery [inOrder = true, slop = 0]
 |
 +-- SpanTermQuery [term = "cat"]
 |
 +-- SpanMultiTermQueryWrapper<PrefixQuery>
      |
      +-- Prefixquery [prefix = "i"]


来源:https://stackoverflow.com/questions/29350887/lucene-phrase-match-with-a-wildcard-at-the-end

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!