Spring batch Multithreaded processing for Single file to Multiple FIle

My problem statement. Read a csv file with 10 million data and store it in db. with as minimal time as possible.

I had implemented it using Simple multi threaded executor of java and the logic is almost similar to spring batch's chunk. Read preconfigured number of data from csv file and then create a thread, and passing the data to thread which validates data and then writes to file which runs in multi thread. once all the task is done I'm calling sql loader to load each file. Now I want to move this code to spring batch(I'm newbie to spring batch)

Here are my question
1. In task, is it possible to make ItemReader to Item writer multi threaded(as I read the file create a new thread to process the data before the thread writes to data)? if not I need to create two steps first step read the file which is single threaded and another step which is multi threaded writing to individual file, but how do I pass the list of data to another task from previous task.
2. In case if there are any failures in a single thread, how can I stop whole batch job processing.
3. How to retry the batch job in case of failure after certain interval. I know that there is retry option in case of failure but I could not find an option to retry the task after certain interval in case of failure. here I'm not talking about scheduler because I've batch job already runs under scheduler, but on failure it has to be re-run after 3 minutes are so.

Here is how I solved the problem.

Read a file and chunk the file( split the file) using Buffered and File Channel reader and writer ( the fastest way of File read/write, even spring batch uses the same). I implemented such that this is executed before job is started( However it can be executed using job as step using method invoker)
Start the Job with directory location as job parameter.
Use multiResourcePartitioner which will get the directory location and for each file a slave step is created in separate thread
In the Slave step get the file passed from Partitioner and use spring batchs itemreader to read the file
Use the Database item writer( I'm using mybatis batch itemwriter) to push the data to Database.
Its better to use the split count equal to commit-count of step.

Luca Basso Ricci

About multi-thread read How to set up multi-threading in Spring Batch? answer; it will point you to right direction. Also, in this sample there are some consideration about restart for CSV file
Job should automatically fails if some error on thread: I have never tried, but this should be the default behaviour
Spring Batch How to set time interval between each call in a Chunk tasklet can be a start. Also, official doc about Backoff Policies - When retrying after a transient failure it often helps to wait a bit before trying again, because usually the failure is caused by some problem that will only be resolved by waiting. If a RetryCallback fails, the RetryTemplate can pause execution according to the BackoffPolicy in place.

Let me known if this help or how you solve problem because I'm interested for my (future) work!
I hope my indications can be helpful.

You can split your input file to many file , the use Partitionner and load small files with threads, but on error , you must restart all job after DB cleaned.

<batch:job id="transformJob">
    <batch:step id="deleteDir" next="cleanDB">
        <batch:tasklet ref="fileDeletingTasklet" />
    </batch:step>
    <batch:step id="cleanDB" next="split">
        <batch:tasklet ref="countThreadTasklet" />
    </batch:step>
    <batch:step id="split" next="partitionerMasterImporter">
        <batch:tasklet>
            <batch:chunk reader="largeCSVReader" writer="smallCSVWriter" commit-interval="#{jobExecutionContext['chunk.count']}" />
        </batch:tasklet>
    </batch:step>
    <batch:step id="partitionerMasterImporter" next="partitionerMasterExporter">
        <partition step="importChunked" partitioner="filePartitioner">
            <handler grid-size="10" task-executor="taskExecutor" />
        </partition>
    </batch:step>
</batch:job>

Full example code (on Github).

Hope this help.

来源：https://stackoverflow.com/questions/18672776/spring-batch-multithreaded-processing-for-single-file-to-multiple-file

标签

multithreading

Spring

spring-batch