Prevent killing some pods when scaling down possible?

后端 未结 2 1838
[愿得一人]
[愿得一人] 2020-12-31 12:53

I need to scale a set of pods that run queue-based workers. Jobs for workers can run for a long time (hours) and should not get interrupted. The number of pods is based on t

2条回答
  •  Happy的楠姐
    2020-12-31 13:20

    There is a kind of workaround that can give some control over the pod termination. Not quite sure if it the best practice, but at least you can try it and test if it suits your app.

    1. Increase the Deployment grace period with terminationGracePeriodSeconds: 3600 where 3600 is the time in seconds of the longest possible task in the app. This makes sure that the pods will not be terminated by the end of the grace period. Read the docs about the pod termination process in detail.
    2. Define a preStop handler. More details about lifecycle hooks can be found in docs as well as in the example. In my case, I've used the script below to create the file which will later be used as a trigger to terminate the pod (probably there are more elegant solutions).
      lifecycle:
        preStop:
          exec:
            command: ["/bin/sh", "-c", "touch /home/node/app/preStop"]
      
      
    3. Stop your app running as soon as the condition is met. When the app exits the pod terminates as well. It is not possible to end the process with PID 1 from preStop shell script so you need to add some logic to the app to terminate itself. In my case, it is a NodeJS app, there is a scheduler that is running every 30 seconds and checks whether two conditions are met. !isNodeBusy identifies whether it is allowed to finish the app and fs.existsSync('/home/node/app/preStop') whether preStop hook was triggered. It might be different logic for your app but you get the basic idea.
      schedule.scheduleJob('*/30 * * * * *', () => {
        if(!isNodeBusy && fs.existsSync('/home/node/app/preStop')){
          process.exit();
        }
      });
      

    Keep in mind that this workaround works only with voluntary disruptions and obviously not helpful with involuntary disruptions. More info in docs.

提交回复
热议问题