Spring batch Multithreaded processing for Single file to Multiple FIle

断了今生、忘了曾经 提交于 2019-12-01 22:29:54

Here is how I solved the problem.

  1. Read a file and chunk the file( split the file) using Buffered and File Channel reader and writer ( the fastest way of File read/write, even spring batch uses the same). I implemented such that this is executed before job is started( However it can be executed using job as step using method invoker)

  2. Start the Job with directory location as job parameter.

  3. Use multiResourcePartitioner which will get the directory location and for each file a slave step is created in separate thread
  4. In the Slave step get the file passed from Partitioner and use spring batchs itemreader to read the file
  5. Use the Database item writer( I'm using mybatis batch itemwriter) to push the data to Database.
  6. Its better to use the split count equal to commit-count of step.
Luca Basso Ricci
  1. About multi-thread read How to set up multi-threading in Spring Batch? answer; it will point you to right direction. Also, in this sample there are some consideration about restart for CSV file
  2. Job should automatically fails if some error on thread: I have never tried, but this should be the default behaviour
  3. Spring Batch How to set time interval between each call in a Chunk tasklet can be a start. Also, official doc about Backoff Policies - When retrying after a transient failure it often helps to wait a bit before trying again, because usually the failure is caused by some problem that will only be resolved by waiting. If a RetryCallback fails, the RetryTemplate can pause execution according to the BackoffPolicy in place.

Let me known if this help or how you solve problem because I'm interested for my (future) work!
I hope my indications can be helpful.

You can split your input file to many file , the use Partitionner and load small files with threads, but on error , you must restart all job after DB cleaned.

<batch:job id="transformJob">
    <batch:step id="deleteDir" next="cleanDB">
        <batch:tasklet ref="fileDeletingTasklet" />
    </batch:step>
    <batch:step id="cleanDB" next="split">
        <batch:tasklet ref="countThreadTasklet" />
    </batch:step>
    <batch:step id="split" next="partitionerMasterImporter">
        <batch:tasklet>
            <batch:chunk reader="largeCSVReader" writer="smallCSVWriter" commit-interval="#{jobExecutionContext['chunk.count']}" />
        </batch:tasklet>
    </batch:step>
    <batch:step id="partitionerMasterImporter" next="partitionerMasterExporter">
        <partition step="importChunked" partitioner="filePartitioner">
            <handler grid-size="10" task-executor="taskExecutor" />
        </partition>
    </batch:step>
</batch:job>

Full example code (on Github).

Hope this help.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!