How to execute some partition step on all servers only once using spring batch partitioning?

一个人想着一个人 提交于 2019-12-12 01:33:15

问题


I am using spring batch partitioning. I read exchanges form files and do some processing for each exchange.

exchanges are distributed over 4 servers to do parallel processing using spring batch partitioning.

I have first step which prepares input files with exchange ids. I need to read these ids on all servers.

Is there any way to run first step on all servers only once to prepare input files on all servers ?

I tried by setting grid size = 4 (number of servers) and consumer concurrency 1 so that on each server only 1 consumer should listen to step execution request.

The problem is, more that 1 request are handled by 1 consumer so steps run more than once on some servers and so does't run on other servers. The result is data is not prepared on some servers and other steps gets failed.

How can I make sure the step runs on all servers only once ?

Below is the configuration

Import job which has prepareExchangeListJob as first step which should work as explained above and second step importExchanges which is normal partition job. And after importExchanges there are many more steps which are normal partition steps.

<job id="importJob">
    <step id="import.prepareExchangesListStep" next="import.importExchangesStep">
        <job ref="prepareExchangesListJob" />
    </step>
    <step id="import.importExchangesStep">
        <job ref="importExchangesJob" />
        <listeners>
            <listener ref="importExchangesStepNotifier" />
        </listeners>
    </step>
</job>

PrepareExchangeList job, please note the grid size= 4 (number of servers) and consumer concurrency = 1 so that the step should exectute only once on each server to prepare input data (exchanges) on all servers.

<rabbit:template id="prepareExchangesListAmqpTemplate"
    connection-factory="rabbitConnectionFactory" routing-key="prepareExchangesListQueue"
    reply-timeout="${prepare.exchanges.list.step.timeout}">
</rabbit:template>

<int:channel id="prepareExchangesListOutboundChannel">
    <int:dispatcher task-executor="taskExecutor" />
</int:channel>

<int:channel id="prepareExchangesListInboundStagingChannel" />

<amqp:outbound-gateway request-channel="prepareExchangesListOutboundChannel"
    reply-channel="prepareExchangesListInboundStagingChannel"
    amqp-template="prepareExchangesListAmqpTemplate"
    mapped-request-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS"
    mapped-reply-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS" />


<beans:bean id="prepareExchangesListMessagingTemplate"
    class="org.springframework.integration.core.MessagingTemplate"
    p:defaultChannel-ref="prepareExchangesListOutboundChannel"
    p:receiveTimeout="${prepare.exchanges.list.step.timeout}" />


<beans:bean id="prepareExchangesListPartitioner"
    class="org.springframework.batch.core.partition.support.SimplePartitioner"
    scope="step" />


<beans:bean id="prepareExchangesListPartitionHandler"
    class="org.springframework.batch.integration.partition.MessageChannelPartitionHandler"
    p:stepName="prepareExchangesListStep" p:gridSize="${prepare.exchanges.list.grid.size}"
    p:messagingOperations-ref="prepareExchangesListMessagingTemplate" />

<int:aggregator ref="prepareExchangesListPartitionHandler"
    send-partial-result-on-expiry="true"
    send-timeout="${prepare.exchanges.list.step.timeout}"
    input-channel="prepareExchangesListInboundStagingChannel" />

<amqp:inbound-gateway concurrent-consumers="1"
    request-channel="prepareExchangesListInboundChannel" reply-channel="prepareExchangesListOutboundStagingChannel"
    queue-names="prepareExchangesListQueue" connection-factory="rabbitConnectionFactory"
    mapped-request-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS"
    mapped-reply-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS" />


<int:channel id="prepareExchangesListInboundChannel" />

<int:service-activator ref="stepExecutionRequestHandler"
    input-channel="prepareExchangesListInboundChannel" output-channel="prepareExchangesListOutboundStagingChannel" />

<int:channel id="prepareExchangesListOutboundStagingChannel" />

<beans:bean id="prepareExchangesFileItemReader"
    class="org.springframework.batch.item.file.FlatFileItemReader"
    p:resource="classpath:primary_markets.txt"
    p:lineMapper-ref="stLineMapper" scope="step" />


<beans:bean id="prepareExchangesItemWriter"
    class="com.st.batch.foundation.writers.PrepareExchangesItemWriter"
    p:dirPath="${spring.tmp.batch.dir}/#{jobParameters[batch_id]}" p:numberOfFiles="4" 
    p:symfony-ref="symfonyStepScoped" scope="step" />


<step id="prepareExchangesListStep">
    <tasklet transaction-manager="transactionManager">
        <chunk reader="prepareExchangesFileItemReader" writer="prepareExchangesItemWriter" commit-interval="${prepare.exchanges.commit.interval}"/>
    </tasklet>
</step>

<job id="prepareExchangesListJob" restartable="true">
    <step id="prepareExchangesListStep.master">
        <partition partitioner="prepareExchangesListPartitioner"
            handler="prepareExchangesListPartitionHandler" />
    </step>
</job>

Import Exchanges Job

<rabbit:template id="importExchangesAmqpTemplate"
    connection-factory="rabbitConnectionFactory" routing-key="importExchangesQueue"
    reply-timeout="${import.exchanges.partition.timeout}">
</rabbit:template>

<int:channel id="importExchangesOutboundChannel">
    <int:dispatcher task-executor="taskExecutor" />
</int:channel>

<int:channel id="importExchangesInboundStagingChannel" />

<amqp:outbound-gateway request-channel="importExchangesOutboundChannel"
    reply-channel="importExchangesInboundStagingChannel" amqp-template="importExchangesAmqpTemplate"
    mapped-request-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS"
    mapped-reply-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS" />


<beans:bean id="importExchangesMessagingTemplate"
    class="org.springframework.integration.core.MessagingTemplate"
    p:defaultChannel-ref="importExchangesOutboundChannel"
    p:receiveTimeout="${import.exchanges.partition.timeout}" />


<beans:bean id="importExchangesPartitionHandler"
    class="org.springframework.batch.integration.partition.MessageChannelPartitionHandler"
    p:stepName="importExchangesStep" p:gridSize="${import.exchanges.grid.size}"
    p:messagingOperations-ref="importExchangesMessagingTemplate" />

<int:aggregator ref="importExchangesPartitionHandler"
    send-partial-result-on-expiry="true"
    send-timeout="${import.exchanges.step.timeout}"
    input-channel="importExchangesInboundStagingChannel" />

<amqp:inbound-gateway concurrent-consumers="${import.exchanges.consumer.concurrency}"
    request-channel="importExchangesInboundChannel" reply-channel="importExchangesOutboundStagingChannel"
    queue-names="importExchangesQueue" connection-factory="rabbitConnectionFactory"
    mapped-request-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS"
    mapped-reply-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS" />


<int:channel id="importExchangesInboundChannel" />

<int:service-activator ref="stepExecutionRequestHandler"
    input-channel="importExchangesInboundChannel" output-channel="importExchangesOutboundStagingChannel" />

<int:channel id="importExchangesOutboundStagingChannel" />


<beans:bean id="importExchangesItemWriter"
    class="com.st.batch.foundation.writers.ImportExchangesAndEclsItemWriter"
    p:symfony-ref="symfonyStepScoped" p:timeout="${import.exchanges.item.timeout}"
    scope="step" />

<beans:bean id="importExchangesPartitioner"
    class="org.springframework.batch.core.partition.support.MultiResourcePartitioner"
    p:resources="file:${spring.tmp.batch.dir}/#{jobParameters[batch_id]}/exchanges/exchanges_*.txt"
    scope="step" />

<beans:bean id="importExchangesFileItemReader"
    class="org.springframework.batch.item.file.FlatFileItemReader"
    p:resource="#{stepExecutionContext[fileName]}" p:lineMapper-ref="stLineMapper"
    scope="step" />

<step id="importExchangesStep">
    <tasklet transaction-manager="transactionManager">
        <chunk reader="importExchangesFileItemReader" writer="importExchangesItemWriter" commit-interval="${import.exchanges.commit.interval}"/>
    </tasklet>
</step>

<job id="importExchangesJob" restartable="true">
    <step id="importExchangesStep.master">
        <partition partitioner="importExchangesPartitioner"
            handler="importExchangesPartitionHandler" />
    </step>
</job>

回答1:


Interesting technique.

I would expect the four partitions to be distributed evenly; rabbit typically does round robin distribution to competing consumers (AFAIK). So I am not exactly sure why you're not seeing that behavior.

You could spend some time trying to figure it out, but it's fragile in that you are relying on this; if one of the slaves had a network glitch, its partition would go to one of the others. It would be better to have each slave bind to a different queue and explicitly route the partitions by adding a routing key expression to the (first) outbound gateway...

routing-key-expression="'foo.' + headers['sequenceNumber']"

and have the slaves listen on foo.1, foo.2 etc., and continue to use a common queue for the second step.

This assumes you are using the default exchange ("") and route by queue name; if you have explicit bindings, you would use those in your routing key expression.

PS: As a reminder you need to increase the RabbitTemplate reply-timeout if your partitions take more than the default 5 seconds to complete.



来源:https://stackoverflow.com/questions/24237074/how-to-execute-some-partition-step-on-all-servers-only-once-using-spring-batch-p

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!