Lucene - Querying multiple terms in a field

孤街浪徒 提交于 2019-12-12 03:32:32

问题


For simplicity sake, consider two documents with the following fields and values:

RecordId: "12345"
CreatedAt: "27/02/1992"
Event: "Manchester, Dubai, Paris"
Event: "Manchester, Rome, Madrid"
Event: "Madrid, Sidney"


RecordId: "99999"
CreatedAt: "27/02/1992"
Event: "Manchester, Barcelona, Rome"
Event: "Rome, Paris"
Event: "Milan, Barcelona"

Is it possible to perform a search for multiple terms within a single instance of a "Event" field ?

Lets say I want to search for "Manchester" and "Paris" to appear in the same field. The second record contains "Manchester" and "Paris" but on different instances of the Event field, which is not supposed to be part of the resultset.

Ideally, the resultset would only be the first record (12345).


回答1:


Depending on the analyser you use for the field (it would need to tokenise and remove the punctuation). You could use a slop phrase query.

"manchester paris"~2 should find just 12345. Depending on the number and order of values in each field you may need to use a larger slop.

The slop defines the number of "operations" on the phrase allowable to match. This can be reordering or additional terms within the phrase.

So "x y"~1 could match

  • "y x"
  • "x fred y"
  • but not "y fred x" (that would require two ops: swamp plus an addition)

For your need the slop probably ought to be equal to the maximum number of terms allowed in a field. I haven't worked it through but I think that would suffice even if you query for more than 2 terms.




回答2:


How about indexing Event as a non-tokenized field, and then using a KeywordAnalyzer for it. You could then use Lucene's Regex query to match the occurrence of both Manchester and Paris:

Event: "/^.*(Manchester).+(Paris).*$/"


来源:https://stackoverflow.com/questions/35765855/lucene-querying-multiple-terms-in-a-field

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!