How to run Concurrent jobs in spring batch without overlapping data read

问题

I've got a table of over 1 million customers. Every customer's information gets updated often but will only be updated once a day. I've got a Spring batch job which

reads a customer from customer table (JdbcCursorItemReader)
processes the customer information (ItemProcessor)
writes to the customer table (ItemWriter)

I want to run 10 jobs at once which will read from one Customer table without reading a customer twice. Is this possible with Spring batch or is this something that I will have to handle at the database level using crawlLog table as mentioned in this post ?

How do I lock read/write to MySQL tables so that I can select and then insert without other programs reading/writing to the database?

I know that parameters can be passed to the job. I can read all the customer ids and distribute the customer ids to the 10 jobs evenly. But would this be right way of doing it?

回答1:

The Framework has several ways to specify what you want, it depends on what you got. The simpler one is just to add a task executor to the step or flow:

<step id="copy">
  <tasklet task-executor="taskExecutor" throttle-limit="10">
  ...
  </tasklet>
</step>

<beans:bean id="taskExecutor"
  class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
  <property name="corePoolSize" value="10"/>
  <property name="maxPoolSize" value="15"/>
</beans:bean>

You may want to have a look at this and the others techniques in the official Spring Batch documentation about scalability.

来源：https://stackoverflow.com/questions/16820304/how-to-run-concurrent-jobs-in-spring-batch-without-overlapping-data-read

标签

Spring

spring-batch