Parallel step execution of ItemStreamReader in SpringBatch

江枫思渺然 提交于 2021-02-08 07:42:35

问题


I have a ItemStreamReader (extends AbstractItemCountingItemStreamItemReader), the reader on its own is quite fast, but the the following processing takes quite some time. From a business point of view I can process as many items in parallel as I want.

As my ItemStreamReader is reading a large JSON file with a JsonParser, it ends up to be statefull. So just adding a TaskExecutor to the Step does not work and throws parsing exceptions and the following log output by spring batch:

16:51:41.023 [main] WARN  o.s.b.c.s.b.FaultTolerantStepBuilder - Asynchronous TaskExecutor detected with ItemStream reader.  This is probably an error, and may lead to incorrect restart data being stored.
16:52:29.790 [jobLauncherTaskExecutor-1] WARN  o.s.b.core.step.item.ChunkMonitor - No ItemReader set (must be concurrent step), so ignoring offset data.
16:52:31.908 [feed-import-1] WARN  o.s.b.core.step.item.ChunkMonitor - ItemStream was opened in a different thread.  Restart data could be compromised.

How can I execute the processing in my Step to be executed in parallel by multiple threads?


回答1:


Spring Batch provides a number of ways to parallelize processing. In your case, since processing seems to be the bottle neck, I'd recommend looking at two options:

AsyncItemProcessor/AsyncItemWriter
The AsyncItemProcessor and AsyncItemWriter work in tandem to parallelize the processing of items within a chunk. You can think of them as a kind of fork/join concept. The items within the chunk are read by a single thread as normal. The AsyncItemProcessor wraps your normal ItemProcessor and executes that logic on a different thread, returning a Future instead of the actual item. The AsyncItemWriter then waits for the Future to return the processed item before writing it. These classes are found in the Spring Batch Integration module. You can read more about them in the documentation here: http://docs.spring.io/spring-batch/trunk/reference/html/springBatchIntegration.html#asynchronous-processors

Remote Chunking
The AsyncItemProcessor/AsyncItemWriter paradigm works well in a single JVM, but if you need to scale your processing further, you may want to take a look at remote chunking. Remote chunking is designed to scale the processor piece of a step to beyond a single JVM. Using a master/slave configuration, the master reads the input using a regular ItemReader. Then the items are sent via Spring Integration channels to the slaves for processing. The results can either be written in the slave or returned to the master for writing. It's important to note that in this approach, each item read by the master will go over the wire so it can be very IO intensive and should only be considered if the processing bottle neck is worse than the potential impact of sending the messages. You can read more about remote chunking in the documentation here: http://docs.spring.io/spring-batch/trunk/reference/html/springBatchIntegration.html#externalizing-batch-process-execution



来源:https://stackoverflow.com/questions/27803478/parallel-step-execution-of-itemstreamreader-in-springbatch

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!