Spring Batch Partitioned Step stopped after hours from when a non-skippable exception occured

喜欢而已 提交于 2021-01-29 16:23:31

问题



I want to verify a behaviour of Spring Batch...
When running a partitioned step of a Job I got this exception:

org.springframework.batch.core.JobExecutionException: Partition handler returned an unsuccessful step
at org.springframework.batch.core.partition.support.PartitionStep.doExecute(PartitionStep.java:111)
at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:195)
at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:137)
at org.springframework.batch.core.job.flow.JobFlowExecutor.executeStep(JobFlowExecutor.java:64)
at org.springframework.batch.core.job.flow.support.state.StepState.handle(StepState.java:60)
at org.springframework.batch.core.job.flow.support.SimpleFlow.resume(SimpleFlow.java:152)
at org.springframework.batch.core.job.flow.support.SimpleFlow.start(SimpleFlow.java:131)
at org.springframework.batch.core.job.flow.FlowJob.doExecute(FlowJob.java:135)
at org.springframework.batch.core.job.AbstractJob.execute(AbstractJob.java:301)
at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:134)
at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:50)
at org.springframework.batch.core.launch.support.SimpleJobLauncher.run(SimpleJobLauncher.java:127)

only this - no previous exceptions that might have triggered this, and then got a FAILED result for my job.

When searching the logs from previous hours-days I noticed these exceptions(3 of them in different partitioned steps):

06/05/2014 21:50:51.996 [Step3TaskExecutor-12] [] ERROR                   AbstractStep - Line (222) Encountered an error executing the step
org.springframework.retry.RetryException: Non-skippable exception in recoverer while processing; nested exception is java.io.FileNotFoundException: Source 'blabla....pdf' does not exist
    at org.springframework.batch.core.step.item.FaultTolerantChunkProcessor$2.recover(FaultTolerantChunkProcessor.java:281)
    at org.springframework.retry.support.RetryTemplate.handleRetryExhausted(RetryTemplate.java:435)
    at org.springframework.retry.support.RetryTemplate.doExecute(RetryTemplate.java:304)
    at org.springframework.retry.support.RetryTemplate.execute(RetryTemplate.java:188)
    at org.springframework.batch.core.step.item.BatchRetryTemplate.execute(BatchRetryTemplate.java:217)
    at org.springframework.batch.core.step.item.FaultTolerantChunkProcessor.transform(FaultTolerantChunkProcessor.java:290)
    at org.springframework.batch.core.step.item.SimpleChunkProcessor.process(SimpleChunkProcessor.java:192)
    at org.springframework.batch.core.step.item.ChunkOrientedTasklet.execute(ChunkOrientedTasklet.java:75)
    at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:395)
    at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:133)
    at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:267)
    at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:77)
    at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:368)
    at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:215)
    at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:144)
    at org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:253)
    at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:195)
    at org.springframework.batch.core.partition.support.TaskExecutorPartitionHandler$1.call(TaskExecutorPartitionHandler.java:139)
    at org.springframework.batch.core.partition.support.TaskExecutorPartitionHandler$1.call(TaskExecutorPartitionHandler.java:136)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.FileNotFoundException: Source 'blabla....pdf' does not exist

It seemed weird to me that after those exceptions, job continued to run, so I'm thinking that only the slave-steps that this exception occured have failed and master step waited for the rest of slave steps to finish in order to return the first error mentioned.
Can someone verify that this is the problem? it's been driving me crazy for days


回答1:


That is correct behavior for Spring Batch's partitioning. The PartitionHandler in the master step evaluates the results of all steps at once when they have all returned (or timed out). With regards to what happened in the slaves, those logged errors would be a leading cause to me. However, the definitive answer should be in the job repository (assuming you're using a database backed implementation). When a step fails (even a partitioned slave), the exception is stored there.



来源:https://stackoverflow.com/questions/23560233/spring-batch-partitioned-step-stopped-after-hours-from-when-a-non-skippable-exce

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!