I \'m using Spring Batch(3.0.1.RELEASE) / JPA and an HSQLBD server database. I need to browse an entire table (using paging) and update items (one by one). So I used a jpaP
I faced the same case, my reader was a JpaPagingItemReader that queried on a field that was updated in the writer. Consequently skipping half of the items that needed to be updated, due to the page window progressing while the items already read were not in the reader scope anymore.
The simplest workaround for me was to override getPage method on the JpaPagingItemReader to always return the first page.
JpaPagingItemReader<XXXXX> jpaPagingItemReader = new JpaPagingItemReader() {
@Override
public int getPage() {
return 0;
}
};
org.springframework.batch.item.database.JpaPagingItemReader creates is own entityManager instance
(from org.springframework.batch.item.database.JpaPagingItemReader#doOpen) :
entityManager = entityManagerFactory.createEntityManager(jpaPropertyMap);
If you are within a transaction, as it seems to be, reader entities are not detached (from org.springframework.batch.item.database.JpaPagingItemReader#doReadPage):
if (!transacted) {
List<T> queryResult = query.getResultList();
for (T entity : queryResult) {
entityManager.detach(entity);
results.add(entity);
}//end if
} else {
results.addAll(query.getResultList());
tx.commit();
}
For this reason, when you update an item into processor, or writer, this item is still managed by reader's entityManager.
When the item reader reads the next chunk of data, it flushes the context to the database.
So, if we look at your case, after the first chunk of data processes, we have in database:
|id|active
|1 | false
|2 | false
|3 | false
org.springframework.batch.item.database.JpaPagingItemReader uses limit & offset to retrieve paginated data. So the next select created by the reader looks like :
select * from table where active = true offset 3 limits 3.
Reader will miss the items with id 4,5,6, because they are now the first rows retrieved by database.
What you can do, as a workaround, is to use jdbc implementation (org.springframework.batch.item.database.JdbcPagingItemReader) as it does not use limit & offset. It is based on a sorted column (typically the id column), so you will not miss any data. Of course, you will have to update your data into the writer (using either JPA ou pure JDBC implementation)
Reader will be more verbose:
@Bean
public ItemReader<? extends Entity> reader() {
JdbcPagingItemReader<Entity> reader = new JdbcPagingItemReader<Entity>();
final SqlPagingQueryProviderFactoryBean sqlPagingQueryProviderFactoryBean = new SqlPagingQueryProviderFactoryBean();
sqlPagingQueryProviderFactoryBean.setDataSource(dataSource);
sqlPagingQueryProviderFactoryBean.setSelectClause("select *");
sqlPagingQueryProviderFactoryBean.setFromClause("from <your table name>");
sqlPagingQueryProviderFactoryBean.setWhereClause("where active = true");
sqlPagingQueryProviderFactoryBean.setSortKey("id");
try {
reader.setQueryProvider(sqlPagingQueryProviderFactoryBean.getObject());
} catch (Exception e) {
e.printStackTrace();
}
reader.setDataSource(dataSource);
reader.setPageSize(3);
reader.setRowMapper(new BeanPropertyRowMapper<Entity>(Entity.class));
return reader;
I had the same problem with rows being skipped based on the pageSize. If I have pageSize set to 2 for example, it would read 2, ignore 2, read 2, ignore 2 etc.
I was building a daemon processor to poll a 'Request' database table for records at a 'Waiting To Be Processed' status. The daemon is designed to run for ever in the background.
I had a 'status' field which was defined in the @NamedQuery and would select records whose status was '10':Waiting to be processed. After the record was processed, the status field would be updated to '20':Error or '30':Success. This turned out to be the cause of the problem - I was updating a field which was defined in the query. If I introduced a 'processedField' and updated that instead of the 'status' field then no problem - all the records would be read.
As a possible solution to updating the status field, I setMaxItemCount to be the same as the PageSize; this updated the records correctly before step completion. I then keep executing the step until a request is made to stop the daemon. OK, probably not the most efficient way to do it (but I’m still benefiting from the ease of use that JPA provides) but I think it would probably be better to use JdbcPagingItemReader (described above – thanks!). Opinions on the best approach to this batch database polling problem would be welcome :)
A couple things to note:
JpaPagingItemReader
are detached. We accomplish this in one of two ways. We either create a transaction before querying for the page, then commit the transaction (which detaches all entities associated with the EntityManager
for that transaction) or we explicitly call entityManager.detach
. We do this so that features like retry and skip can be correctly performed.//do some stuff
section, your item is getting re-attached which is why the update is occurring. However, without being able to see that code, I can't be sure.ItemWriter
should be done. In fact, I consider it a bug that we don't require an ItemWriter
when using java config (we do for XML).*PagingItemReader
s. They all execute independent queries for each page of data. So if you update the underlying data in between each page, it can have an impact on the items returned in future pages. For example, if my paging query specifies where val1 > 4
and I have a record that val1 was 1 to be 5, in chunk 2, that item may be returned since it now meets the criteria. If you need to update values that are in your where clause (thereby impacting what falls into the set of data you'd be processing), it's best to add a processed flag of some kind that you can query by instead.