Elasticsearch snapshots

混江龙づ霸主 提交于 2021-01-24 09:46:06

问题


I have many (10+) Elasticsearch clusters, and this clusters is use for different purposes (storing logs, storing some business and analytical data) So for example i have a 3-node elasticsearch cluster used for some business data (users shopping carts in e-commerce website) and i take snapshots every day and this cluster makes snapshots to NFS share, and my admins told me that i must to clear last 10 snapshots from the snapshot repository to free disk space. And for example the somebody/or me accidentally launch curl -XDELETE/* which delete all indices in my cluster, and i must to restore all business data which was here, and i have only 10 snapshots from 10 last days, can i restore all the data? or it restore data only from the last snapshots date? because in the documentations said that Snapshots are incremental: each snapshot only stores data that is not part of an earlier snapshot so for example the customer Joe in my website add something to cart in 01/09/2020, then in the 15/09/2020 i delete all data from cluster, and my last snapshot in snapshot repository is /03/09/2020 so if i restore from this snapshot, this snapshot will contain old data or not? sorry for my bad english


回答1:


An interesting test to understand this is to perform the following process:

  1. create an index
  2. index one document
  3. create a first snapshot A
  4. index a second document
  5. create a second snapshot B
  6. delete the first snapshot A
  7. delete the index
  8. restore the snapshot B

Do you think the first document is gone? Let's find out... here are all the steps to reproduce the above process:

# 1. create an index
PUT test

# 2. index one document
PUT test/_doc/1
{
  "id": 1
}

# 3. create a first snapshot A
PUT /_snapshot/my-snapshots/snapshot_a?wait_for_completion=true
{
  "indices": "test",
  "ignore_unavailable": true,
  "include_global_state": false
}

# 4. index a second document
PUT test/_doc/2
{
  "id": 2
}

# 5. create a second snapshot B
PUT /_snapshot/my-snapshots/snapshot_b?wait_for_completion=true
{
  "indices": "test",
  "ignore_unavailable": true,
  "include_global_state": false
}

# 6. delete the first snapshot A
DELETE /_snapshot/my-snapshots/snapshot_a

# 7. delete the index
DELETE test

# 8. restore the snapshot B
POST /_snapshot/found-snapshots/snapshot_b/_restore

# 9. And now check the content of the index
GET test/_search

=>
    "hits" : [
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "id" : 1
        }
      },
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "id" : 2
        }
      }
    ]

So the bottom line of this is that older documents are still contained in newer snapshots and deleting old snapshots doesn't mean deleting old documents.

A snapshot contains an exact copy of all the shard segment files that exist at the moment of the snapshot creation. Over time, smaller segment files get merged into bigger ones. When the next snapshot happens, it will copy the newer bigger segment files and the older snapshots will still contain the older smaller segment files.

It doesn't mean, however, that it's always safe to only keep the latest snapshot and think that all the data is in there, but if you do daily snapshots, I think it's safe to keep only the 10 last snapshots and expect that all the data is there.

The last thing worth noting is that when you delete a snapshot, ES will delete all files associated with the snapshot that are not in-use by other snapshots, which basically makes deleting snapshots inherently safe.




回答2:


Elasticsearch snapshot is exact copy of your cluster data as it was when create snapshot was triggered.

So yes, if you restore snapshot which was taken on 03/09/2020 will contain old data.The content of cluster after restoring the snapshot will be exactly same as it was when you triggered create snapshot on 03/09/2020

There are multiple questions you asked, let me try to answer them one by one :

Q1. My admins told me that i must to clear last 10 snapshots from the snapshot repository to free disk space. And for example the somebody/or me accidentally launch curl -XDELETE/* which delete all indices in my cluster, and i must to restore all business data which was here, and i have only 10 snapshots from 10 last days, can i restore all the data?

A1. If you DELETE all the 10 available snapshots then you are left with no snapshots in your repository. As you don't have any snapshots in your repository, hence you won't be able to restore. In order to restore from snapshot you must have snapshot in repository. You can check list of available snapshots in repository by GET /_cat/snapshots/<repository>

Q2. it restore data only from the last snapshots date?

A2. No. whichever snapshot you are trying to restore from and if the snapshot exists in your repository then it will recover data from the given snapshot. If it's full cluster restore then, as I mentioned earlier, content of cluster will be exactly same as it was when you triggered create snapshot.

Q3. so for example the customer Joe in my website add something to cart in 01/09/2020, then in the 15/09/2020 i delete all data from cluster, and my last snapshot in snapshot repository is /03/09/2020 so if i restore from this snapshot, this snapshot will contain old data or not?

A3. Yes, this snapshot will contain old data of 01/09/2020 as well because when you created snapshot on 03/09/2020, data of 01/09/2020 was present in cluster .



来源:https://stackoverflow.com/questions/64498925/elasticsearch-snapshots

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!