ElasticSearch: Unassigned Shards, how to fix?

前端未结

关注

 24  1190

悲&欢浪女 2020-12-04 05:03

I have an ES cluster with 4 nodes:

number_of_replicas: 1
search01 - master: false, data: false
search02 - master: true, data: true
search03 - master: false,


      
      
        
          24条回答        

        
                    
            
            
                         
                
              
              
                
                   春和景丽
                                             
                
                
                (楼主)
            
              
              
                2020-12-04 05:42
              

            
            
                        
I had two indices with unassigned shards that didn't seem to be self-healing.  I eventually resolved this by temporarily adding an extra data-node^[1].  After the indices became healthy and everything stabilized to green, I removed the extra node and the system was able to rebalance (again) and settle on a healthy state.

It's a good idea to avoid killing multiple data nodes at once (which is how I got into this state).  Likely, I had failed to preserve any copies/replicas for at least one of the shards.  Luckily, Kubernetes kept the disk storage around, and reused it when I relaunched the data-node.



...Some time has passed...

Well, this time just adding a node didn't seem to be working (after waiting several minutes for something to happen), so I started poking around in the REST API.

GET /_cluster/allocation/explain


This showed my new node with "decision": "YES".

By the way, all of the pre-existing nodes had "decision": "NO" due to "the node is above the low watermark cluster setting".  So this was probably a different case than the one I had addressed previously.

Then I made the following simple POST^[2] with no body, which kicked things into gear...

POST /_cluster/reroute




Other notes:


Very helpful: https://datadoghq.com/blog/elasticsearch-unassigned-shards
Something else that may work. Set cluster_concurrent_rebalance to 0, then to null -- as I demonstrate here.




^{^[1] Pretty easy to do in Kubernetes if you have enough headroom: just scale out the stateful set via the dashboard.}

^{^[2] Using the Kibana "Dev Tools" interface, I didn't have to bother with SSH/exec shells.}
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它24个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复