问题
We have spring batch job which is processing 100 million records in multithreded job with scaling process like partitioning. Here master step create 500 paritions and those are being processed by 100 threads. But sometimes job is failing with just following exception. If I rerun the job without any code change it just works. Can someone explain what might be causing issue in slave step which is running in diff thread which makes master step to fail and stop processing further.
2015-09-11 17:22:21,365 ERROR [task-scheduler-9] org.springframework.batch.core.step.AbstractStep - Encountered an error executing step productImport.master in job productImportJob
org.springframework.batch.core.JobExecutionException: Partition handler returned an unsuccessful step
at org.springframework.batch.core.partition.support.PartitionStep.doExecute(PartitionStep.java:112) ~[spring-batch-core-3.0.3.RELEASE.jar:3.0.3.RELEASE]
at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:198) ~[spring-batch-core-3.0.3.RELEASE.jar:3.0.3.RELEASE]
at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:148) [spring-batch-core-3.0.3.RELEASE.jar:3.0.3.RELEASE]
at org.springframework.batch.core.job.flow.JobFlowExecutor.executeStep(JobFlowExecutor.java:64) [spring-batch-core-3.0.3.RELEASE.jar:3.0.3.RELEASE]
at org.springframework.batch.core.job.flow.support.state.StepState.handle(StepState.java:67) [spring-batch-core-3.0.3.RELEASE.jar:3.0.3.RELEASE]
at org.springframework.batch.core.job.flow.support.SimpleFlow.resume(SimpleFlow.java:165) [spring-batch-core-3.0.3.RELEASE.jar:3.0.3.RELEASE]
at org.springframework.batch.core.job.flow.support.SimpleFlow.start(SimpleFlow.java:144) [spring-batch-core-3.0.3.RELEASE.jar:3.0.3.RELEASE]
at org.springframework.batch.core.job.flow.FlowJob.doExecute(FlowJob.java:134) [spring-batch-core-3.0.3.RELEASE.jar:3.0.3.RELEASE]
at org.springframework.batch.core.job.AbstractJob.execute(AbstractJob.java:304) [spring-batch-core-3.0.3.RELEASE.jar:3.0.3.RELEASE]
at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:135) [spring-batch-core-3.0.3.RELEASE.jar:3.0.3.RELEASE]
at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:50) [spring-core-4.1.2.RELEASE.jar:4.1.2.RELEASE]
at org.springframework.batch.core.launch.support.SimpleJobLauncher.run(SimpleJobLauncher.java:128) [spring-batch-core-3.0.3.RELEASE.jar:3.0.3.RELEASE]
at org.springframework.batch.integration.launch.JobLaunchingMessageHandler.launch(JobLaunchingMessageHandler.java:50) [spring-batch-integration-3.0.3.RELEASE.jar:3.0.3.RELEASE]
at org.springframework.batch.integration.launch.JobLaunchingGateway.handleRequestMessage(JobLaunchingGateway.java:76) [spring-batch-integration-3.0.3.RELEASE.jar:3.0.3.RELEASE]
at org.springframework.integration.handler.AbstractReplyProducingMessageHandler.handleMessageInternal(AbstractReplyProducingMessageHandler.java:99) [spring-integration-core-4.1.2.RELEASE.jar:na]
at org.springframework.integration.handler.AbstractMessageHandler.handleMessage(AbstractMessageHandler.java:78) [spring-integration-core-4.1.2.RELEASE.jar:na]
at org.springframework.integration.endpoint.PollingConsumer.handleMessage(PollingConsumer.java:74) [spring-integration-core-4.1.2.RELEASE.jar:na]
at org.springframework.integration.endpoint.AbstractPollingEndpoint.doPoll(AbstractPollingEndpoint.java:219) [spring-integration-core-4.1.2.RELEASE.jar:na]
at org.springframework.integration.endpoint.AbstractPollingEndpoint.access$000(AbstractPollingEndpoint.java:55) [spring-integration-core-4.1.2.RELEASE.jar:na]
at org.springframework.integration.endpoint.AbstractPollingEndpoint$1.call(AbstractPollingEndpoint.java:149) [spring-integration-core-4.1.2.RELEASE.jar:na]
at org.springframework.integration.endpoint.AbstractPollingEndpoint$1.call(AbstractPollingEndpoint.java:146) [spring-integration-core-4.1.2.RELEASE.jar:na]
at org.springframework.integration.endpoint.AbstractPollingEndpoint$Poller$1.run(AbstractPollingEndpoint.java:298) [spring-integration-core-4.1.2.RELEASE.jar:na]
at org.springframework.integration.util.ErrorHandlingTaskExecutor$1.run(ErrorHandlingTaskExecutor.java:52) [spring-integration-core-4.1.2.RELEASE.jar:na]
at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:50) [spring-core-4.1.2.RELEASE.jar:4.1.2.RELEASE]
at org.springframework.integration.util.ErrorHandlingTaskExecutor.execute(ErrorHandlingTaskExecutor.java:49) [spring-integration-core-4.1.2.RELEASE.jar:na]
at org.springframework.integration.endpoint.AbstractPollingEndpoint$Poller.run(AbstractPollingEndpoint.java:292) [spring-integration-core-4.1.2.RELEASE.jar:na]
回答1:
First adjust "maxPoolSize" property, if still facing the issue, adjust these other properties of your ThreadPoolTaskExecutor (assuming you using the same) according to the load on your job.
<property name="corePoolSize" value="20">
<property name="queueCapacity" value="20">
<property name="maxPoolSize" value="20">
<property name="allowCoreThreadTimeout" value="true">
来源:https://stackoverflow.com/questions/32629724/spring-batch-job-throws-error-like-partition-handler-returned-an-unsuccessful-s