elasticsearch nested filter return empty result

╄→尐↘猪︶ㄣ 提交于 2019-12-11 05:41:03

问题


I have this mapping:

  "post": {
    "model": "Post",
    "properties": {
      "id": {
        "type": "integer"
      },
      "title": {
        "type": "string",
        "analyzer": "custom_analyzer",
        "boost": 5
      },
      "description": {
        "type": "string",
        "analyzer": "custom_analyzer",
        "boost": 4
      },
      "condition": {
        "type": "integer",
        "index": "not_analyzed"
      },
      "categories": {
        "type": "string",
        "index": "not_analyzed"
      },
      "seller": {
        "type": "nested",
        "properties": {
          "id": {
            "type": "integer",
            "index": "not_analyzed"
          },
          "username": {
            "type": "string",
            "analyzer": "custom_analyzer",
            "boost": 1
          },
          "firstName": {
            "type": "string",
            "analyzer": "custom_analyzer",
            "boost": 3
          },
          "lastName": {
            "type": "string",
            "analyzer": "custom_analyzer",
            "boost": 2
          }
        }
      },
      "marketPrice": {
        "type": "float",
        "index": "not_analyzed"
      },
      "currentPrice": {
        "type": "float",
        "index": "not_analyzed"
      },
      "discount": {
        "type": "float",
        "index": "not_analyzed"
      },
      "commentsCount": {
        "type": "integer",
        "index": "not_analyzed"
      },
      "likesCount": {
        "type": "integer",
        "index": "not_analyzed"
      },
      "featured": {
        "type": "boolean",
        "index": "not_analyzed"
      },
      "bumped": {
        "type": "boolean",
        "index": "not_analyzed"
      },
      "created": {
        "type": "date",
        "index": "not_analyzed"
      },
      "modified": {
        "type": "date",
        "index": "not_analyzed"
      }
    }
  }

And this query:

GET /develop/_search?search_type=dfs_query_then_fetch
{
  "query": {
    "filtered" : {
        "query": {
          "bool": {
            "must": [
              { "match": { "title": "post" }}
            ]
          }
        },
        "filter": {
          "bool": { 
            "must": [
              {"term": {
                "featured": 0
              }},
              { 
              "nested": {
                "path": "seller",
                "filter": {
                  "bool": {
                    "must": [
                      { "term": { "seller.firstName": "Test 3" } }
                    ]
                  }
                },
                "_cache" : true
              }}
            ]
          } 
        }
    }
  },
  "sort": [
    {
      "_score":{
        "order": "desc"
      }
    },{
      "created": {
        "order": "desc"
      }
    }
  ],
  "track_scores": true
}

I wait 25 results because i have 25 post indexed. But i get an empty set. If i remove the nested filter all work just fine. I want to be able to filter for the nested object

EDIT:

In my settings i have:

    "analyzer": {
      "custom_analyzer": {
        "type": "custom",
        "tokenizer": "nGram",
        "filter": [
          "stopwords",
          "asciifolding",
          "lowercase",
          "snowball",
          "english_stemmer",
          "english_possessive_stemmer",
          "worddelimiter"
        ]
      },
      "custom_search_analyzer": {
        "type": "custom",
        "tokenizer": "standard",
        "filter": [
          "stopwords",
          "asciifolding",
          "lowercase",
          "snowball",
          "english_stemmer",
          "english_possessive_stemmer",
          "worddelimiter"
        ]
      }
    }

What im missing here.

Thanks


回答1:


Short version: try this (after updating endpoint and index name):

curl -XPOST "http://localhost:9200/my_index/_search?search_type=dfs_query_then_fetch" -d'
{
   "query": {
      "filtered": {
         "query": {
            "bool": {
               "must": [
                  {
                     "match": {
                        "title": "post"
                     }
                  }
               ]
            }
         },
         "filter": {
            "bool": {
               "must": [
                  {
                     "nested": {
                        "path": "seller",
                        "filter": {
                           "bool": {
                              "must": [
                                 {
                                    "terms": {
                                       "seller.firstName": [
                                          "test",
                                          "3"
                                       ],
                                       "execution": "and"
                                    }
                                 }
                              ]
                           }
                        }
                     }
                  }
               ]
            }
         }
      }
   }
}'

It worked for me, with a simplified version of your setup. I'll post an an edit with a longer explanation in a little while.

EDIT: long version:

The problem with your query is the analyzer combined with the term filter in your query. Your analyzer is breaking the text of the firstName field into tokens; so "Test 3" becomes the tokens "test" and "3". When you use { "term": { "seller.firstName": "Test 3" } } what you're saying is, find a document where one of the tokens for "seller.firstName" is "Test 3", and there aren't any documents for which that is true (in fact, there can't be given the way your analyzer is set up). You could use "index": "not_analyzed" on that field and then your query would work, or you can use a terms filter like I showed above. Here's how I got there:

I started with the index definition you linked to in your comment, and simplified it a little to make it more readable and still maintain the essential issue:

curl -XDELETE "http://localhost:9200/my_index"

curl -XPUT "http://localhost:9200/my_index" -d'
{
   "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 0,
      "analysis": {
         "filter": {
            "snowball": { "type": "snowball", "language": "English" },
            "english_stemmer": { "type": "stemmer", "language": "english" },
            "english_possessive_stemmer": { "type": "stemmer", "language": "possessive_english" },
            "stopwords": { "type": "stop",  "stopwords": [ "_english_" ] },
            "worddelimiter": { "type": "word_delimiter" }
         },
         "tokenizer": {
            "nGram": { "type": "nGram", "min_gram": 3, "max_gram": 20 }
         },
         "analyzer": {
            "custom_analyzer": {
               "type": "custom",
               "tokenizer": "nGram",
               "filter": [
                  "stopwords",
                  "asciifolding",
                  "lowercase",
                  "snowball",
                  "english_stemmer",
                  "english_possessive_stemmer",
                  "worddelimiter"
               ]
            },
            "custom_search_analyzer": {
               "type": "custom",
               "tokenizer": "standard",
               "filter": [
                  "stopwords",
                  "asciifolding",
                  "lowercase",
                  "snowball",
                  "english_stemmer",
                  "english_possessive_stemmer",
                  "worddelimiter"
               ]
            }
         }
      }
   },
   "mappings": {
      "posts": {
         "properties": {
            "title": {
               "type": "string",
               "analyzer": "custom_analyzer",
               "boost": 5
            },
            "seller": {
               "type": "nested",
               "properties": {
                  "firstName": {
                     "type": "string",
                     "analyzer": "custom_analyzer",
                     "boost": 3
                  }
               }
            }
         }
      }
   }
}'

Then I added a few test docs:

curl -XPUT "http://localhost:9200/my_index/posts/1" -d'
{"title": "post", "seller": {"firstName":"Test 1"}}'
curl -XPUT "http://localhost:9200/my_index/posts/2" -d'
{"title": "post", "seller": {"firstName":"Test 2"}}'
curl -XPUT "http://localhost:9200/my_index/posts/3" -d'
{"title": "post", "seller": {"firstName":"Test 3"}}'

Then ran a simplified version of your query with the basic structure still intact, but with a terms filter instead of a term filter:

curl -XPOST "http://localhost:9200/my_index/_search?search_type=dfs_query_then_fetch" -d'
{
   "query": {
      "filtered": {
         "query": {
            "bool": {
               "must": [
                  {
                     "match": {
                        "title": "post"
                     }
                  }
               ]
            }
         },
         "filter": {
            "bool": {
               "must": [
                  {
                     "nested": {
                        "path": "seller",
                        "filter": {
                           "bool": {
                              "must": [
                                 {
                                    "terms": {
                                       "seller.firstName": [
                                          "test",
                                          "3"
                                       ],
                                       "execution": "and"
                                    }
                                 }
                              ]
                           }
                        }
                     }
                  }
               ]
            }
         }
      }
   }
}'
...
{
   "took": 5,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 6.085842,
      "hits": [
         {
            "_index": "my_index",
            "_type": "posts",
            "_id": "3",
            "_score": 6.085842,
            "_source": {
               "title": "post",
               "seller": {
                  "firstName": "Test 3"
               }
            }
         }
      ]
   }
}

which seems to return what you're wanting.

Here is the code I used:

http://sense.qbox.io/gist/041dd929106d27ea606f48ce1f86076c52faec91



来源:https://stackoverflow.com/questions/27787614/elasticsearch-nested-filter-return-empty-result

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!