Spring Batch how to filter duplicated items before send it to ItemWriter

前端 未结 2 920
温柔的废话
温柔的废话 2020-12-15 12:56

I read a flat file (for example a .csv file with 1 line per User, Ex: UserId;Data1;Date2).

But how to handle duplicated User item in the reader (where is no

相关标签:
2条回答
  • 2020-12-15 13:38

    As you could see here http://docs.spring.io/spring-batch/trunk/reference/html/readersAndWriters.html#faultTolerant

    When a chunk is rolled back, items that have been cached during reading may be reprocessed. If a step is configured to be fault tolerant (uses skip or retry processing typically), any ItemProcessor used should be implemented in a way that is idempotent

    This means that in Michael's example, the first time a user is Processed the user is cached in the Set and if there is a failure Writing the item, if the step is fault tolerance the Processor will be executed again for the same User and this Filter will filter out the user.

    Improved code:

    /**
     * This implementation assumes that there is enough room in memory to store the duplicate
     * Users.  Otherwise, you'd want to store them somewhere you can do a look-up on.
     */
    public class UserFilterItemProcessor implements ItemProcessor<User, User> {
    
        // This assumes that User.equals() identifies the duplicates
        private Set<User> seenUsers = new HashSet<User>();
    
        public User process(User user) {
            if(seenUsers.contains(user) && !user.hasBeenProcessed()) {
                return null;
            } else {
                seenUsers.add(user);
                user.setProcessed(true);
                return user;
            }
        }
    }
    
    0 讨论(0)
  • 2020-12-15 13:51

    Filtering is typically done with an ItemProcessor. If the ItemProcessor returns null, the item is filtered and not passed to the ItemWriter. Otherwise, it is. In your case, you could keep a list of previously seen users in the ItemProcessor. If the user hasn't been seen before, pass it on. If it has been seen before, return null. You can read more about filtering with an ItemProcessor in the documentation here: http://docs.spring.io/spring-batch/trunk/reference/html/readersAndWriters.html#filiteringRecords

    /**
    * This implementation assumes that there is enough room in memory to store the duplicate
    * Users.  Otherwise, you'd want to store them somewhere you can do a look-up on.
    */
    public class UserFilterItemProcessor implements ItemProcessor<User, User> {
    
        // This assumes that User.equals() identifies the duplicates
        private Set<User> seenUsers = new HashSet<User>();
    
        public User process(User user) {
            if(seenUsers.contains(user)) {
                return null;
            }
            seenUsers.add(user);
            return user;
    
        }
    }
    
    0 讨论(0)
提交回复
热议问题