In my Elasticsearch index I have documents that have multiple tokens at the same position.
I want to get a document back when I match at least one token at every position. The order of the tokens is not important. How can I accomplish that? I use Elasticsearch 0.90.5.
Example:
I index a document like this.
{ "field":"red car" } I use a synonym token filter that adds synonyms at the same positions as the original token. So now in the field, there are 2 positions:
- Position 1: "red"
- Position 2: "car", "automobile"
My solution for now:
To be able to ensure that all positions match, I index the maximum position as well.
{ "field":"red car", "max_position": 2 } I have a custom similarity that extends from DefaultSimilarity and returns 1 tf(), idf() and lengthNorm(). The resulting score is the number of matching terms in the field.
Query:
{ "custom_score": { "query": { "match": { "field": "a car is an automobile" } }, "_script": "_score*100/doc[\"max_position\"]+_score" }, "min_score":"100" } Problem with my solution:
The above search should not match the document, because there is no token "red" in the query string. But it matches, because Elasticsearch counts the matches for car and automobile as two matches and that gives a score of 2 which leads to a script score of 102, which satisfies the "min_score".