Extract record from multiple arrays based on a filter

人盡茶涼 提交于 2019-12-18 07:14:46

问题


I have documents in ElasticSearch with the following structure :

"_source": {
          "last_updated": "2017-10-25T18:33:51.434706",
          "country": "Italia",
          "price": [
            "€ 139",
            "€ 125",
            "€ 120",
            "€ 108"
          ],
          "max_occupancy": [
            2,
            2,
            1,
            1
          ],
          "type": [
            "Type 1",
            "Type 1 - (Tag)",
            "Type 2",
            "Type 2 (Tag)",
          ],
          "availability": [
            10,
            10,
            10,
            10
          ],
          "size": [
            "26 m²",
            "35 m²",
            "47 m²",
            "31 m²"
          ]
        }
      }

Basically, the details records are split in 5 arrays, and fields of the same record have the same index position in the 5 arrays. As can be seen in the example data there are 5 array(price, max_occupancy, type, availability, size) that are containing values related to the same element. I want to extract the element that has max_occupancy field greater or equal than 2 (if there is no record with 2 grab a 3 if there is no 3 grab a four, ...), with the lower price, in this case the record and place the result into a new JSON object like the following :

{
          "last_updated": "2017-10-25T18:33:51.434706",
          "country": "Italia",
          "price: ": "€ 125",
          "max_occupancy": "2",
          "type": "Type 1 - (Tag)",
          "availability": 10,
          "size": "35 m²"
}  

Basically the result structure should show the extracted record(that in this case is the second index of all array), and add the general information to it(fields : "last_updated", "country").

Is it possible to extract such a result from elastic search? What kind of query do I need to perform?

Could someone suggest the best approach?


回答1:


My best approach: go nested with Nested Datatype

Except for easier querying, it easier to read and understand the connections between those objects that are, currently, scattered in different arrays.

Yes, if you'll decide this approach you will have to edit your mapping and re-index your entire data.

How would the mapping is going to look like? something like this:

{
  "mappings": {
    "properties": {
      "last_updated": {
        "type": "date"
      },
      "country": {
        "type": "string"
      },
      "records": {
        "type": "nested",
        "properties": {
          "price": {
            "type": "string"
          },
          "max_occupancy": {
            "type": "long"
          },
          "type": {
            "type": "string"
          },
          "availability": {
            "type": "long"
          },
          "size": {
            "type": "string"
          }
        }
      }
    }
  }
}

EDIT: New document structure (containing nested documents) -

{
  "last_updated": "2017-10-25T18:33:51.434706",
  "country": "Italia",
  "records": [
    {
      "price": "€ 139",
      "max_occupancy": 2,
      "type": "Type 1",
      "availability": 10,
      "size": "26 m²"
    },
    {
      "price": "€ 125",
      "max_occupancy": 2,
      "type": "Type 1 - (Tag)",
      "availability": 10,
      "size": "35 m²"
    },
    {
      "price": "€ 120",
      "max_occupancy": 1,
      "type": "Type 2",
      "availability": 10,
      "size": "47 m²"
    },
    {
      "price": "€ 108",
      "max_occupancy": 1,
      "type": "Type 2 (Tag)",
      "availability": 10,
      "size": "31 m²"
    }
  ]
}

Now, its more easy to query for any specific condition with Nested Query and Inner Hits. for example:

{
  "_source": [
    "last_updated",
    "country"
  ],
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "country": "Italia"
          }
        },
        {
          "nested": {
            "path": "records",
            "query": {
              "bool": {
                "must": [
                  {
                    "range": {
                      "records.max_occupancy": {
                        "gte": 2
                      }
                    }
                  }
                ]
              }
            },
            "inner_hits": {
              "sort": {
                "records.price": "asc"
              },
              "size": 1
            }
          }
        }
      ]
    }
  }
}

Conditions are: Italia AND max_occupancy > 2.

Inner hits: sort by price ascending order and get the first result.

Hope you'll find it useful



来源:https://stackoverflow.com/questions/46971933/extract-record-from-multiple-arrays-based-on-a-filter

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!