问题
After reading this answer (by Michael Minella)
Spring batch chunk processing , how does the reader work ?if the result set changes?
I assume with JdbcPagingItemReader, the query is run again for each page. In this case, when reading a new page it is possible a new record had been inserted in a position before this page starts, causing the last record of previous page to be processed again.
This means in order to prevent a record to be reprocessed I must always set a "processed already" flag manually into input data and check it before writing ? Is this a feasible approach ?
The same question applies to a JdbcCursorItemReader when the process is interrupted (power outage) and restarted. What happens if a new record has been inserted before the current index that is saved into ExecutionContext ?
回答1:
Your assumptions are right.
In case of the JdbcPagingItemReader this will also depend on the transaction isolation level of your transaction (READ_COMMITED, READ_UNCOMMITTED, ...).
In case of the JdbcCursorItemReader you have to ensure that the query returns the exact same result set (including order) in the case of a restart. Otherwise the results are unpredictable.
In the batches I'm writing, I often save the result of the selection into a csv file in the first step and configure the reader with "saveState=false", if I cannot guarantee that the selection will produce the same results in case of a crash. So, if the first step fails a restart will produce a complete new csv-file. After the first step, all the entries that need to be processed are in a file. And of course, this file cannot change and therefore, in a case of a restart, continuing processing from the last successful chunk is possible from the second step onward.
Edited: Using a "state-column" works well, if you have a single step that does the reading (having the state-column in its where-clause), processing and writing/updating (the state-column to 'processed') the state. You just have to start the job again as a new launch, if such a job fails.
来源:https://stackoverflow.com/questions/39009949/how-do-readers-keep-track-of-current-position-in-case-query-result-changes