How to scale write and index's size dynamically with Elasticsearch?

旧巷老猫 提交于 2019-12-23 03:49:06

问题


I am currently exploring solutions in order to archive and provide a web search engine for enormous documentation data. I have firstly started my search looking for search engine solution and I end up with the conclusion that Elasticsearch was one of the best one when you have to deal with huge amount of data. I have read that it scale easily and out of the box and i was convinced.

Then I looked about No SQL database and because of the number of actors, i spent more time on my searching and I have read several resources (No SQL distilled, Amazon Dynamo paper, Google BigTable paper, etc.) that led me to a better understanding of distributed system in general. I have also seen that most of the No SQL scalable databases have the ability to automatically split a shard in two shards when it becomes too big.

Then I realize that Elasticsearch does not provide this feature. Moreover, believing to the documentation :http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-update-settings.html

We can not increase the number of shards of an index after his creation. So this brings my questions :

Suppose you create an index specifying a number of shards for an expected traffic/amount of data and one day your expectation is exceeded, you haven't enough shard to handle write request and your index's size, how can you manage this situation ?


回答1:


I think i found a way, if someone who knows ElasticSearch well can confirm it would work great, it would be nice.

I have just read this article and the last section inspire me this idea:

http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/

The idea is to create two alias (index_search and index_write) which point at the beginning to the same index (let's call it index_1). Imagine one day the number of shard in index_1 isn't enough, in this case, we can create a new index (let's call it index_2) with the same mappings and with the number of shard, we would have added to the index_1 if we could have done it.

Then, we update the alias index_search to make it point to "index_1, index_2" (both index_1 and index_2), like that search will be made on the two index. Then, we update index_write to index_2 so write will be made only on the new shards because the shards of index_1 are considered full.

In the future, we could add a new index (index_3) and map index_search to "index_1, index_2, index_3".

Of course in our application we would always use the alias and never the real name of the index like that, the transformation will be invisible for the application and we would not have to change the code of our application.

Example using Sense syntax :

PUT index_1
{
    "settings": {
        "number_of_shards": 1
    }
}

POST _aliases
{
    "actions": [
       {
          "add": {
             "index": "index_1",
             "alias": "index_search"
          }
       },
        {
          "add": {
             "index": "index_1",
             "alias": "index_write"
          }
       }
    ]
}

PUT index_write/article/1
{
    "title":"One first index",
    "article":"This is an article that is indexed on index_1"
}

PUT index_2
{
    "settings": {
        "number_of_shards": 2
    }
}

POST _aliases
{
    "actions": [
       {
          "add": {
             "index": "index_2",
             "alias": "index_search"
          }
       },
        {
          "add": {
             "index": "index_2",
             "alias": "index_write"
          }
       },
        {
          "remove": {
             "index": "index_1",
             "alias": "index_write"
          }
       }
    ]
}

PUT index_write/article/2
{
    "title":"One second index",
    "article":"This is an article that is indexed on index_2"
}

The problem with this solution is if you update a document on index_1 while index_write point on index_2, it will make a copy of it. It means you will have to search it before update it in order to found is real index. Moreover you can not use the GET command with id one index_write.




回答2:


In that situation we need to delete all indexed data and increase shards and reindex all the data..

For more information refer the following link

http://m.youtube.com/watch?v=lpZ6ZajygDY



来源:https://stackoverflow.com/questions/22304776/how-to-scale-write-and-indexs-size-dynamically-with-elasticsearch

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!