I used the following logic to restart the uncompleted jobs on single-node Spring Batch application:
public void restartUncompletedJobs() {
try {
Your logic is not restarting uncompleted jobs. Your logic is taking currently running job executions, setting their status to FAILED
and restarting them. Your logic should not find running executions, it should look for not currently running executions, especially failed ones and restart them.
How to correctly restart the failed jobs and prevent the situation when the jobs like jobInstance2 will be also restarted?
In pseudo code, what you need to do to achieve this is:
JobOperator#getJobInstances
For each instance, check if there is a running execution using JobOperator#getExecutions
.
2.1 If there is a running execution, move to next instance (in order to let the execution finish either successfully or with a failure)
2.2 If there is no currently running execution, check the status of the last execution and restart it if failed using JobOperator#restart
.
In your scenario:
jobInstance1
should be restarted in step 2.2jobInstance2
should be filtered in step 2.1 since there is a running execution for it on node 2.