Skip duplicates on field in a Elasticsearch search result

☆樱花仙子☆ 提交于 2021-02-11 12:29:31

问题


Is it possible to remove duplicates on a given field?

For example the following query:

{
  "query": {
    "term": {
      "name_admin": {
        "value": "nike"
      }
    }
  },
  "_source": [
    "name_admin",
    "parent_sku",
    "sku"
  ],
  "size": 2
}

is retrieving

"hits" : [
      {
        "_index" : "product",
        "_type" : "_doc",
        "_id" : "central30603",
        "_score" : 4.596813,
        "_source" : {
          "parent_sku" : "SSP57",
          "sku" : "SSP57816401",
          "name_admin" : "NIKE U NSW PRO CAP NIKE AIR"
        }
      },
      {
        "_index" : "product",
        "_type" : "_doc",
        "_id" : "central156578",
        "_score" : 4.596813,
        "_source" : {
          "parent_sku" : "SSP57",
          "sku" : "SSP57816395",
          "name_admin" : "NIKE U NSW PRO CAP NIKE AIR"
        }
      }
    ]

I'd like to skip duplicates on parent_sku so I only have one result per parent_sku like it's possible with suggestion by doing something like "skip_duplicates": true.

I know I cloud achieve this with an aggregation but I'd like to stick with a search, as my query is a bit more complicated and as I'm using the scroll API which doesn't work with aggregations.


回答1:


Field collapsing should help here

{
  "query": {
    "term": {
      "name_admin": {
        "value": "nike"
      }
    }
  },
  "collapse" : {
      "field" : "parent_sku",
      "inner_hits": {
          "name": "parent", 
          "size": 1
      }
  },
  "_source": false,
  "size": 2
}

The above query will return one document par parent_sku.



来源:https://stackoverflow.com/questions/62152287/skip-duplicates-on-field-in-a-elasticsearch-search-result

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!