Spring Batch how to filter duplicated items before send it to ItemWriter

僤鯓⒐⒋嵵緔 提交于 2019-11-30 07:18:15

问题


I read a flat file (for example a .csv file with 1 line per User, Ex: UserId;Data1;Date2).

But how to handle duplicated User item in the reader (where is no list of previus readed users...)

stepBuilderFactory.get("createUserStep1")
.<User, User>chunk(1000)
.reader(flatFileItemReader) // FlatFileItemReader
.writer(itemWriter) // For example JDBC Writer
.build();

回答1:


Filtering is typically done with an ItemProcessor. If the ItemProcessor returns null, the item is filtered and not passed to the ItemWriter. Otherwise, it is. In your case, you could keep a list of previously seen users in the ItemProcessor. If the user hasn't been seen before, pass it on. If it has been seen before, return null. You can read more about filtering with an ItemProcessor in the documentation here: http://docs.spring.io/spring-batch/trunk/reference/html/readersAndWriters.html#filiteringRecords

/**
* This implementation assumes that there is enough room in memory to store the duplicate
* Users.  Otherwise, you'd want to store them somewhere you can do a look-up on.
*/
public class UserFilterItemProcessor implements ItemProcessor<User, User> {

    // This assumes that User.equals() identifies the duplicates
    private Set<User> seenUsers = new HashSet<User>();

    public User process(User user) {
        if(seenUsers.contains(user)) {
            return null;
        }
        seenUsers.add(user);
        return user;

    }
}



回答2:


As you could see here http://docs.spring.io/spring-batch/trunk/reference/html/readersAndWriters.html#faultTolerant

When a chunk is rolled back, items that have been cached during reading may be reprocessed. If a step is configured to be fault tolerant (uses skip or retry processing typically), any ItemProcessor used should be implemented in a way that is idempotent

This means that in Michael's example, the first time a user is Processed the user is cached in the Set and if there is a failure Writing the item, if the step is fault tolerance the Processor will be executed again for the same User and this Filter will filter out the user.

Improved code:

/**
 * This implementation assumes that there is enough room in memory to store the duplicate
 * Users.  Otherwise, you'd want to store them somewhere you can do a look-up on.
 */
public class UserFilterItemProcessor implements ItemProcessor<User, User> {

    // This assumes that User.equals() identifies the duplicates
    private Set<User> seenUsers = new HashSet<User>();

    public User process(User user) {
        if(seenUsers.contains(user) && !user.hasBeenProcessed()) {
            return null;
        } else {
            seenUsers.add(user);
            user.setProcessed(true);
            return user;
        }
    }
}


来源:https://stackoverflow.com/questions/27318466/spring-batch-how-to-filter-duplicated-items-before-send-it-to-itemwriter

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!