How to absolutely delete something from ElasticSearch?

三世轮回 提交于 2019-12-11 00:57:22

问题


We use an ELK stack for our logging. I've been asked to design a process for how we would remove sensitive information that had been logged accidentally.

Now based on my reading around how ElasticSearch (Lucene) handles deletes and updates the data is still in the index just not available. It will ultimately get cleaned up as indexes get merged, etc..

Is there a process to run an update (to redact something) or delete (to remove something) and guarantee its removal?


回答1:


When updating or deleting some value, ES will mark the current document as deleted and index the new document. The deleted value will still be available in the index, but will never get back from a search. Granted, if someone gets access to the underlying index files, he might be able to use some tool (Luke or similar) to view what's inside the index files and potentially see the deleted sensitive data.

The only way to guarantee that the documents marked as deleted are really deleted from the index segments, is to force a merge of the existing segments.

POST /myindex/_forcemerge?only_expunge_deletes=true

Be aware, though, that there is a setting called index.merge.policy.expunge_deletes_allowed that defines a threshold below which the force merge doesn't happen. By default this threshold is set at 10%, so if you have less than 10% deleted documents, the force merge call won't do anything. You might need to lower the threshold in order for the deletion to happen... or maybe easier, make sure to not index sensitive information.



来源:https://stackoverflow.com/questions/50986201/how-to-absolutely-delete-something-from-elasticsearch

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!