Elasticsearch Query on indexes whose name is matching a certain pattern

社会主义新天地 提交于 2021-02-10 06:14:08

问题


I have a couple of indexes in my Elasticsearch DB as follows

Index_2019_01

Index_2019_02

Index_2019_03

Index_2019_04

.
.

Index_2019_12

Suppose I want to search only on the first 3 Indexes. I mean a regular expression like this:

select count(*) from Index_2019_0[1-3] where LanguageId="English"

What is the correct way to do that in Elasticsearch?


回答1:


How can I query several indexes with certain names?

This can be achieved via multi-index search, which is a built-in capability of Elasticsearch. To achieve described behavior one should try a query like this:

POST /index_2019_01,index_2019_02/_search
{
  "query": {
    "match": {
      "LanguageID": "English"
    }
  }
}

Or, using URI search:

curl 'http://<host>:<port>/index_2019_01,index_2019_02/_search?q=LanguageID:English'

More details are available here. Note that Elasticsearch requires index names to be lowercase.

Can I use a regex to specify index name pattern?

In short, no. It is possible to use index name in queries using a special "virtual" field _index but its use is limited. For instance, one cannot use a regexp against index name:

The _index is exposed as a virtual field — it is not added to the Lucene index as a real field. This means that you can use the _index field in a term or terms query (or any query that is rewritten to a term query, such as the match, query_string or simple_query_string query), but it does not support prefix, wildcard, regexp, or fuzzy queries.

For instance, the query from above can be rewritten as:

POST /_search
{
  "query": {
    "bool": {
      "must": [
        {
          "terms": {
            "_index": [
              "index_2019_01",
              "index_2019_02"
            ]
          }
        },
        {
          "match": {
            "LanguageID": "English"
          }
        }
      ]
    }
  }
}

Which employs a bool and a terms queries.

Hope that helps!




回答2:


Why use POST when you are not adding any additional data to it. I advise using GET for your case. Secondly, If the Index have similar names like in your case, you should be using an index pattern like in the query below,

GET /index_2019_*/_search
{
  "query": {
    "match": {
      "LanguageID": "English"
    }
  }
}

OR in a URL

curl -XGET "http://<host>:<port>/index_2019_*/_search" -H 'Content-Type: application/json' -d'{"query": {"match":{"LanguageID": "English"}}}'



回答3:


While searching for indices using a regex is not possible you might be able to use date math to take you a bit further.

You can look at the docs here

As an example, lets say you wish the last 3 months from those indices that means that if we have index_2019_01
index_2019_02
index_2019_03
index_2019_04
And today is 2019/04/20, we could use the following query to get 04,03 and 02

GET /<index-{now/M-0M{yyyy_MM}}>,<index-{now/M-1M{yyyy_MM}}>,<index-{now/M-2M{yyyy_MM}}>  

I used M-0M for the first one so the query construction loop doesn't need a special case for the first index
Look at the docs regarding URL encoding this query and how to have literal braces in the index name, if a client is used the URL encoding is done for you (at least in the python client)



来源:https://stackoverflow.com/questions/54535171/elasticsearch-query-on-indexes-whose-name-is-matching-a-certain-pattern

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!