问题
I have a ItemStreamReader
(extends AbstractItemCountingItemStreamItemReader
), the reader on its own is quite fast, but the the following processing takes quite some time.
From a business point of view I can process as many items in parallel as I want.
As my ItemStreamReader
is reading a large JSON file with a JsonParser, it ends up to be statefull. So just adding a TaskExecutor
to the Step
does not work and throws parsing exceptions and the following log output by spring batch:
16:51:41.023 [main] WARN o.s.b.c.s.b.FaultTolerantStepBuilder - Asynchronous TaskExecutor detected with ItemStream reader. This is probably an error, and may lead to incorrect restart data being stored.
16:52:29.790 [jobLauncherTaskExecutor-1] WARN o.s.b.core.step.item.ChunkMonitor - No ItemReader set (must be concurrent step), so ignoring offset data.
16:52:31.908 [feed-import-1] WARN o.s.b.core.step.item.ChunkMonitor - ItemStream was opened in a different thread. Restart data could be compromised.
How can I execute the processing in my Step to be executed in parallel by multiple threads?
回答1:
Spring Batch provides a number of ways to parallelize processing. In your case, since processing seems to be the bottle neck, I'd recommend looking at two options:
AsyncItemProcessor/AsyncItemWriter
The AsyncItemProcessor
and AsyncItemWriter
work in tandem to parallelize the processing of items within a chunk. You can think of them as a kind of fork/join concept. The items within the chunk are read by a single thread as normal. The AsyncItemProcessor
wraps your normal ItemProcessor
and executes that logic on a different thread, returning a Future
instead of the actual item. The AsyncItemWriter
then waits for the Future
to return the processed item before writing it. These classes are found in the Spring Batch Integration module. You can read more about them in the documentation here: http://docs.spring.io/spring-batch/trunk/reference/html/springBatchIntegration.html#asynchronous-processors
Remote Chunking
The AsyncItemProcessor
/AsyncItemWriter
paradigm works well in a single JVM, but if you need to scale your processing further, you may want to take a look at remote chunking. Remote chunking is designed to scale the processor piece of a step to beyond a single JVM. Using a master/slave configuration, the master reads the input using a regular ItemReader
. Then the items are sent via Spring Integration channels to the slaves for processing. The results can either be written in the slave or returned to the master for writing. It's important to note that in this approach, each item read by the master will go over the wire so it can be very IO intensive and should only be considered if the processing bottle neck is worse than the potential impact of sending the messages. You can read more about remote chunking in the documentation here: http://docs.spring.io/spring-batch/trunk/reference/html/springBatchIntegration.html#externalizing-batch-process-execution
来源:https://stackoverflow.com/questions/27803478/parallel-step-execution-of-itemstreamreader-in-springbatch