问题
Following is what we are trying to achieve.
We want a big xml file to be staged in a database parallely in different vms. To achieve this, we are using the scalable spring batch remote partition approach and we are running into some issues. Following is the high level setup
- master - splits an xml file into multiple partitions ( we currently have a grid size of 3).
- slave 1 - processing partitions (reads index based partitions and writes to DB)
slave 2 - processing partitions
We are running it in Linux and with active MQ 5.15.3.
With the above setup
slave 1 is processing 2 partitions at the same time
slave 2 is processing 1 partition .
The master is not waiting for all the slaves to complete and is going to an unknown state.
org.springframework.batch.core.JobExecutionException: Partition handler returned an unsuccessful step
at org.springframework.batch.core.partition.support.PartitionStep.doExecute(PartitionStep.java:112)
at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:200)
at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:148)
if we have the grid size to be 2 then slave 1 is picking both of them and slave 2 is not getting any of them (Hence greedy slave). Slave 1 is processing them in parallel but the job is going to a unknown state.
Following are our questions
- How can we prevent the slave 1 from processing two partitions in parallel - we tried setting the prefetch to be 0 and 1 [a link]http://activemq.apache.org/what-is-the-prefetch-limit-for.html [a link] but that did not work. How can we get the slave to process one partition at a time?
- Why is the master not waiting for all the slaves?
Following is our configuration
Master configuration
<?xml version="1.0" encoding="UTF-8"?>
<step id="remotePartitionStagingStep">
<partition partitioner="xmlPartitioner" handler="partitionStagingHandler"/>
<listeners>
<listener ref="jobListener" />
</listeners>
</step>
<!-- XML Partitioner starts here -->
<beans:bean id="xmlPartitioner" class="XMLPartitioner" scope="step"> <beans:property name="partitionThreadName" value="ImportXMLPartition-"/>
<beans:property name="resource" value="file:///#{jobParameters[ImportFileProcessed]}"/>
<beans:property name="rootElementNode" value="#{jobParameters[ImportFragmentRootNameWithPrefix]}"/>
</beans:bean>
<beans:bean id="partitionStagingHandler" class="org.springframework.batch.integration.partition.MessageChannelPartitionHandler">
<beans:property name="stepName" value="slave.ImportStagingStep"/>
<beans:property name="gridSize" value="3"/>
<beans:property name="replyChannel" ref="aggregatedReplyChannel"/>
<beans:property name="jobExplorer" ref="jobExplorer"/>
<beans:property name="messagingOperations">
<beans:bean class="org.springframework.integration.core.MessagingTemplate">
<beans:property name="defaultChannel" ref="requestsChannel"/>
<beans:property name="receiveTimeout" value="${batch.gateway.receiveTimeout}" /> //360000 is the current value
</beans:bean>
</beans:property>
</beans:bean>
<int:aggregator ref="partitionStagingHandler"
input-channel="replyChannel"
output-channel="aggregatedReplyChannel"
send-timeout="${batch.gateway.receiveTimeout}"
expire-groups-upon-timeout="true"/>
<int:channel id="requestsChannel">
<int:interceptors>
<int:wire-tap channel="logChannel"/>
</int:interceptors>
</int:channel>
<int-jms:outbound-channel-adapter connection-factory="connectionFactory"
channel="requestsChannel"
destination-name="requestsQueue"/>
<int:channel id="aggregatedReplyChannel">
<int:queue/>
</int:channel>
<int:channel id="replyChannel">
<int:interceptors>
<int:wire-tap channel="logChannel"/>
</int:interceptors>
</int:channel>
<int-jms:message-driven-channel-adapter connection-factory="connectionFactory"
channel="replyChannel"
error-channel="errorChannel"
destination-name="replyQueue"/>
Slave configuration
<step id="slave.ImportStagingStep">
<tasklet transaction-manager="transactionManager">
<chunk reader="StagingSpecificItemReader" processor="chainedStagingProcessor" writer="StagingItemWriter"
commit-interval="${import.CommitInterval}" skip-limit="${import.skipLimit}" retry-policy="neverRetryPolicy">
<streams>
<stream ref="errorFlatFileRecordWriter"/>
</streams>
<skippable-exception-classes>
<include class="java.lang.Exception"/>
<exclude class="org.springframework.oxm.UnmarshallingFailureException"/>
<exclude class="java.lang.Error"/>
</skippable-exception-classes>
</chunk>
<listeners>
<listener ref="stepExceptionListener"/>
<listener ref="stagingListener"/>
</listeners>
</tasklet>
<listeners>
<listener ref="slaveStepExecutionListener"/>
</listeners>
</step>
<beans:bean id="slaveStepExecutionListener" class="StepExecutionListener"></beans:bean>
<!-- JMS config for staging step starts here -->
<int:channel id="replyChannel">
<int:interceptors>
<int:wire-tap channel="logChannel"/>
</int:interceptors>
</int:channel>
<int:channel id="requestChannel">
<int:interceptors>
<int:wire-tap channel="logChannel"/>
</int:interceptors>
</int:channel>
<int-jms:message-driven-channel-adapter connection-factory="connectionFactory"
destination-name="requestsQueue"
error-channel="errorChannel"
channel="requestsChannel"/>
<int-jms:outbound-channel-adapter connection-factory="connectionFactory"
destination-name="replyQueue"
channel="replyChannel"/>
<int:service-activator input-channel="requestsChannel"
output-channel="replyChannel"
ref="stepExecutionRequestHandler"/>
<!-- JMS config for staging step ends here -->
<!-- The logChannel is configured as an interceptor to channels so that messages are logged. -->
<int:logging-channel-adapter auto-startup="true" log-full-message="true" id="logChannel" level="INFO"/>
<int:channel id="errorChannel" />
<int:service-activator input-channel="errorChannel" method="handleException">
<beans:bean class="ErrorHandler" />
</int:service-activator>
回答1:
Found the issue that we were having.
The slave was including the spring batch job definition twice that caused the number of consumers to be registered twice for each slave. I think that we are missing some configuration to route messages back from the slave to the master when we have two partitions running concurrently on the slave correctly so at this point as long as we have one partition going to one slave (we are multithreading on the slave ) we are fine.
来源:https://stackoverflow.com/questions/50826773/remote-partition-slave-getting-greedy