问题
I read a flat file (for example a .csv file with 1 line per User, Ex: UserId;Data1;Date2).
But how to handle duplicated User item in the reader (where is no list of previus readed users...)
stepBuilderFactory.get("createUserStep1")
.<User, User>chunk(1000)
.reader(flatFileItemReader) // FlatFileItemReader
.writer(itemWriter) // For example JDBC Writer
.build();
回答1:
Filtering is typically done with an ItemProcessor. If the ItemProcessor returns null, the item is filtered and not passed to the ItemWriter. Otherwise, it is. In your case, you could keep a list of previously seen users in the ItemProcessor. If the user hasn't been seen before, pass it on. If it has been seen before, return null. You can read more about filtering with an ItemProcessor in the documentation here: http://docs.spring.io/spring-batch/trunk/reference/html/readersAndWriters.html#filiteringRecords
/**
* This implementation assumes that there is enough room in memory to store the duplicate
* Users. Otherwise, you'd want to store them somewhere you can do a look-up on.
*/
public class UserFilterItemProcessor implements ItemProcessor<User, User> {
// This assumes that User.equals() identifies the duplicates
private Set<User> seenUsers = new HashSet<User>();
public User process(User user) {
if(seenUsers.contains(user)) {
return null;
}
seenUsers.add(user);
return user;
}
}
回答2:
As you could see here http://docs.spring.io/spring-batch/trunk/reference/html/readersAndWriters.html#faultTolerant
When a chunk is rolled back, items that have been cached during reading may be reprocessed. If a step is configured to be fault tolerant (uses skip or retry processing typically), any ItemProcessor used should be implemented in a way that is idempotent
This means that in Michael's example, the first time a user is Processed the user is cached in the Set and if there is a failure Writing the item, if the step is fault tolerance the Processor will be executed again for the same User and this Filter will filter out the user.
Improved code:
/**
* This implementation assumes that there is enough room in memory to store the duplicate
* Users. Otherwise, you'd want to store them somewhere you can do a look-up on.
*/
public class UserFilterItemProcessor implements ItemProcessor<User, User> {
// This assumes that User.equals() identifies the duplicates
private Set<User> seenUsers = new HashSet<User>();
public User process(User user) {
if(seenUsers.contains(user) && !user.hasBeenProcessed()) {
return null;
} else {
seenUsers.add(user);
user.setProcessed(true);
return user;
}
}
}
来源:https://stackoverflow.com/questions/27318466/spring-batch-how-to-filter-duplicated-items-before-send-it-to-itemwriter