How do readers keep track of current position in case query result changes?

牧云@^-^@ 提交于 2019-12-12 03:37:31

问题


After reading this answer (by Michael Minella)

Spring batch chunk processing , how does the reader work ?if the result set changes?

I assume with JdbcPagingItemReader, the query is run again for each page. In this case, when reading a new page it is possible a new record had been inserted in a position before this page starts, causing the last record of previous page to be processed again.

This means in order to prevent a record to be reprocessed I must always set a "processed already" flag manually into input data and check it before writing ? Is this a feasible approach ?

The same question applies to a JdbcCursorItemReader when the process is interrupted (power outage) and restarted. What happens if a new record has been inserted before the current index that is saved into ExecutionContext ?


回答1:


Your assumptions are right.

In case of the JdbcPagingItemReader this will also depend on the transaction isolation level of your transaction (READ_COMMITED, READ_UNCOMMITTED, ...).

In case of the JdbcCursorItemReader you have to ensure that the query returns the exact same result set (including order) in the case of a restart. Otherwise the results are unpredictable.

In the batches I'm writing, I often save the result of the selection into a csv file in the first step and configure the reader with "saveState=false", if I cannot guarantee that the selection will produce the same results in case of a crash. So, if the first step fails a restart will produce a complete new csv-file. After the first step, all the entries that need to be processed are in a file. And of course, this file cannot change and therefore, in a case of a restart, continuing processing from the last successful chunk is possible from the second step onward.

Edited: Using a "state-column" works well, if you have a single step that does the reading (having the state-column in its where-clause), processing and writing/updating (the state-column to 'processed') the state. You just have to start the job again as a new launch, if such a job fails.



来源:https://stackoverflow.com/questions/39009949/how-do-readers-keep-track-of-current-position-in-case-query-result-changes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!