Elasticsearch - query primary and secondary attribute with different terms

荒凉一梦 提交于 2019-12-11 16:23:59

问题


I'm using elasticsearch to query data that originally was exported out of several relational databases that had a lot of redundencies. I now want to perform queries where I have a primary attribute and one or more secondary attributes that should match. I tried using a bool query with a must term and a should term, but that doesn't seem to work for my case, which may look like this:

Example:

I have a document with fullname and street name of a user and I want to search for similiar users in different indices. So the best match for my query should be the best match on fullname and best match on streetname field. But since the original data has a lot of redundencies and inconsistencies the field fullname (which I manually created out of fields name1, name2, name3) may contain the same name multiple times and it seems that elasticsearch ranks a double match in a must field higher than a match in a should attribute.

That means, I want to query for John Doe Back Street with the following sample data:

{
    "fullname" : "John Doe John and Jane",
    "street" : "Main Street"

}
{
    "fullname" : "John Doe",
    "street" : "Back Street"

}

Long story short, I want to query for a main attribute fullname - John Doe and secondary attribute street - Back Street and want the second document to be the best match and not the first because it contains John multiple times.


回答1:


Manipulation of relevance in Elasticsearch is not the easiest part. Score calculation is based on three main parts:

  • Term frequency
  • Inverse document frequency
  • Field-length norm

Shortly:

  • the often the term occurs in field, the MORE relevant is
  • the often the term occurs in entire index, the LESS relevant is
  • the longer the term is, the MORE relevant is

I recommend you to read below materials:

  • What Is Relevance?
  • Theory Behind Relevance Scoring
  • Controlling Relevance and subpages

If in general, in your case, result of fullname is more important than from street you can boost importance of the first one. Below you have example code base on my working code:

{
  "query": {
    "multi_match": {
      "query": "john doe",
      "fields": [
        "fullname^10",
        "street"
      ]
    }
  }
}

In this example result from fullname is ten times (^10) much important than result from street. You can try to manipulate the boost or use other ways to control relevance but as I mentioned at the beginning - it is not the easiest way and everything depends on your particular situation. Mostly because of "inverse document frequency" part which considers terms from entire index - each next added document to index will probably change the score of the same search query.

I know that I did not answer directly but I hope to helped you to understand how this works.



来源:https://stackoverflow.com/questions/49617053/elasticsearch-query-primary-and-secondary-attribute-with-different-terms

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!