How to restart kubernetes nodes?

前端未结

关注

 6  1901

予麋鹿 2020-12-24 00:53

The status of nodes is reported as unknown

\"conditions\": [
          {
            \"type\": \"Ready\",
            \"status\": \"Unknown\",


      
      
        
          6条回答        

        
                    
            
            
                         
                
              
              
                
                   再見小時候
                                             
                
                
                (楼主)
            
              
              
                2020-12-24 01:37
              

            
            
                        
If a node is so unhealthy that the master can't get status from it -- Kubernetes may not be able to restart the node.  And if health checks aren't working, what hope do you have of accessing the node by SSH?

In this case, you may have to hard-reboot -- or, if your hardware is in the cloud, let your provider do it.

For example, the AWS EC2 Dashboard allows you to right-click an instance to pull up an "Instance State" menu -- from which you can reboot/terminate an unresponsive node.

Before doing this, you might choose to kubectl cordon node for good measure.  And you may find kubectl delete node to be an important part of the process for getting things back to normal -- if the node doesn't automatically rejoin the cluster after a reboot.



Why would a node become unresponsive?  Probably some resource has been exhausted in a way that prevents the host operating system from handling new requests in a timely manner.  This could be disk, or network -- but the more insidious case is out-of-memory (OOM), which Linux handles poorly.

To help Kubernetes manage node memory safely, it's a good idea to do both of the following:


Reserve some memory for the system.
Be very careful with (avoid) opportunistic memory specifications for your pods.  In other words, don't allow different values of requests and limits for memory.


The idea here is to avoid the complications associated with memory overcommit, because memory is incompressible, and both Linux and Kubernetes' OOM killers may not trigger before the node has already become unhealthy and unreachable.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它6个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复