问题
I've got a table of over 1 million customers. Every customer's information gets updated often but will only be updated once a day. I've got a Spring batch job which
- reads a customer from customer table (JdbcCursorItemReader)
- processes the customer information (ItemProcessor)
- writes to the customer table (ItemWriter)
I want to run 10 jobs at once which will read from one Customer table without reading a customer twice. Is this possible with Spring batch or is this something that I will have to handle at the database level using crawlLog table as mentioned in this post ?
How do I lock read/write to MySQL tables so that I can select and then insert without other programs reading/writing to the database?
I know that parameters can be passed to the job. I can read all the customer ids and distribute the customer ids to the 10 jobs evenly. But would this be right way of doing it?
回答1:
The Framework has several ways to specify what you want, it depends on what you got. The simpler one is just to add a task executor to the step or flow:
<step id="copy">
<tasklet task-executor="taskExecutor" throttle-limit="10">
...
</tasklet>
</step>
<beans:bean id="taskExecutor"
class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
<property name="corePoolSize" value="10"/>
<property name="maxPoolSize" value="15"/>
</beans:bean>
You may want to have a look at this and the others techniques in the official Spring Batch documentation about scalability.
来源:https://stackoverflow.com/questions/16820304/how-to-run-concurrent-jobs-in-spring-batch-without-overlapping-data-read