ElasticSearch: Unassigned Shards, how to fix?

前端 未结 24 1190
悲&欢浪女
悲&欢浪女 2020-12-04 05:03

I have an ES cluster with 4 nodes:

number_of_replicas: 1
search01 - master: false, data: false
search02 - master: true, data: true
search03 - master: false,          


        
24条回答
  •  春和景丽
    2020-12-04 05:42

    I had two indices with unassigned shards that didn't seem to be self-healing. I eventually resolved this by temporarily adding an extra data-node[1]. After the indices became healthy and everything stabilized to green, I removed the extra node and the system was able to rebalance (again) and settle on a healthy state.

    It's a good idea to avoid killing multiple data nodes at once (which is how I got into this state). Likely, I had failed to preserve any copies/replicas for at least one of the shards. Luckily, Kubernetes kept the disk storage around, and reused it when I relaunched the data-node.


    ...Some time has passed...

    Well, this time just adding a node didn't seem to be working (after waiting several minutes for something to happen), so I started poking around in the REST API.

    GET /_cluster/allocation/explain
    

    This showed my new node with "decision": "YES".

    By the way, all of the pre-existing nodes had "decision": "NO" due to "the node is above the low watermark cluster setting". So this was probably a different case than the one I had addressed previously.

    Then I made the following simple POST[2] with no body, which kicked things into gear...

    POST /_cluster/reroute
    

    Other notes:

    • Very helpful: https://datadoghq.com/blog/elasticsearch-unassigned-shards

    • Something else that may work. Set cluster_concurrent_rebalance to 0, then to null -- as I demonstrate here.


    [1] Pretty easy to do in Kubernetes if you have enough headroom: just scale out the stateful set via the dashboard.

    [2] Using the Kibana "Dev Tools" interface, I didn't have to bother with SSH/exec shells.

提交回复
热议问题