Deciding between Spring Batch Step, Tasklet or Chunks

匿名 (未验证) 提交于 2019-12-03 02:49:01

问题:

I have a straight forward requirement in which, i need to read a list of items(from DB) and need to process the items and once processed, it has to be updated into DB.

I'm thinking of using Spring batch Chunks with reader, processor and writer. My reader will return one item at a time from the list and sends it to processor and once processing is over, it returns to Writer where it updates the DB

I may be multithreading it later with some cost of synchronization in these methods.

Here I foresee a few concerns.

  1. Number of items to be processed could be more. May be in 10,000s or even more.
  2. some logical calculation is required in the processor. hence processing 1 item at a time. not sure about the performance even if it is multithreaded with 10 threads.
  3. Writer can update the results in the DB for that processed item. Not sure how to do batch updates because it always has only 1 item processed and ready.

Is this approach correct for this kind of usecase or anything better can be done? Is there anyother way of processing a bunch of items at one call of reader, processor & writer? if so, do i need to create some mechnism where i extract say 10 items from the list and give it to processor? it seems writer updates each records as it comes, batch updates makes sense only if the writer receives a bunch of processed items. any suggestion?

Please throw some lights on this design for better performance.

Thanks,

回答1:

Spring Batch is the perfect tool to do what you need.

The chunk oriented step let you configure how many items you want to read/process/write with the commit-interval property.

        <batch:step id="step1" next="step2">         <batch:tasklet transaction-manager="transactionManager" start-limit="100">             <batch:chunk reader="myReader" processor="myProcessor" writer="MyWriter" commit-interval="800" />             <batch:listeners>                 <batch:listener ref="myListener" />             </batch:listeners>         </batch:tasklet>     </batch:step> 

Let say your reader will call a SELECT statement that returns 10 000 records. And you set a commit-interval=500.

MyReader will call the read() method 500 times. Let say that in reality, the reader implementation might in fact remove items from the resultSet. For each call to read(), it will also call the process() method of MyProcessor.

But it will not call the write() method of MyWriter until the commit-interval is reached.

If you look at the definition of the interface ItemWriter:

public interface ItemWriter<T> {  /**  * Process the supplied data element. Will not be called with any null items  * in normal operation.  *   * @throws Exception if there are errors. The framework will catch the  * exception and convert or rethrow it as appropriate.  */ void write(List<? extends T> items) throws Exception;  } 

You see that the write receive a List of items. This list will be the size of your commit-interval (or less if the end is reached)

And btw, 10 000 of records is nothing. You may consider multithreading if you have to deal with millions of records. But even then, just playing around with the sweet spot of the commit-interval value will probably be enough.

Hope it helps



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!