“match” query along with “should” clause giving more than required match results in Elasticsearch

前端未结

关注

 2  1449

I have written the following lucene query in elasticsearch for getting documents with Id field as mentioned:

GET requirements_v3/_search
  {
   \"from\": 0,


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  时光取名叫无心        
                
              
                            
                2020-12-20 04:04
              
            
            
                                                                       
As you said that your Id is text as well as keyword so you should use Id.keyword for matching exact values like 

GET requirements_v3/_search
  {
   "from": 0, 
   "size": 10, 
   "query": {
   "bool": {
  "filter": {
    "bool": {
      "should": [
    {"match": {
      "Id.keyword": "b8bf49a4-960b-4fa8-8c5f-a3fce4b4d07b"
    }},
    {
      "match": {
      "Id.keyword": "048b7907-2b5a-438a-ace9-f1e1fd67ca69"
      }
    }
  ]
 }
 }
 }     
}


But I guess you should use terms if you wants to match multiple exact values. Have a look here. For an example:

{
    "terms" : {
        "Id" : ["b8bf49a4-960b-4fa8-8c5f-a3fce4b4d07b", "048b7907-2b5a-438a-ace9-f1e1fd67ca69"]
    }
}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  广开言路        
                
              
                            
                2020-12-20 04:08
              
            
            
                                                                       
Lets understand this by the following mapping e.g:

{
  "_doc": {
    "properties": {
      "Id": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "Name": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }
    }
  }
}


The above mapping is created dynamically by elasticsearch. Lets us now focus on Id field. Its type is text. By default the analyzer for text datatype is standard analyzer. When this analyzer is applied on the input for this field it get tokenized into terms. So for example if you input value for Id is 33f87d98-024f-4893-aa1c-8d438a98cd1f following tokens get generated:

33f87d98
024f
4893
aa1c
8d438a98cd1f


As you can see the input value is splitted by - being used as delimiter. This is because standard analyzer is applied on it.

There is another sub-field under Id which is keyword and its type is keyword. For type keyword the input is indexed as it is without applying any modification.

Now lets understand why more documents get matched and result count is more than expected. In your query you used match query on Id field as below:

{
  "match": {
    "Id": "b8bf49a4-960b-4fa8-8c5f-a3fce4b4d07b"
  }
}


By default match query uses the same analyzer that is applied on the field in mapping. So on the Id value in the query again the same analyzer is applied and the input is splitted into tokens in a similar way as above. The default operator that is applied between tokens of match query input string is OR and hence your query actually becomes:

b8bf49a4 OR 960b OR 4fa8 OR 8c5f OR a3fce4b4d07b

There if any of the above tokens match to any of the indexed terms stored in Id field, the document is considered a match.

Solution for the above based on above mapping:

Use the keyword field instead. So the query becomes:

{
  "match": {
    "Id.keyword": "b8bf49a4-960b-4fa8-8c5f-a3fce4b4d07b"
  }
}


More on how match works see here.

Also as mention by @Curious_MInd in his answer its better to use terms than using multiple match in should.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复