Spring Batch how to filter duplicated items before send it to ItemWriter

前端 未结 2 923
温柔的废话
温柔的废话 2020-12-15 12:56

I read a flat file (for example a .csv file with 1 line per User, Ex: UserId;Data1;Date2).

But how to handle duplicated User item in the reader (where is no

2条回答
  •  孤城傲影
    2020-12-15 13:38

    As you could see here http://docs.spring.io/spring-batch/trunk/reference/html/readersAndWriters.html#faultTolerant

    When a chunk is rolled back, items that have been cached during reading may be reprocessed. If a step is configured to be fault tolerant (uses skip or retry processing typically), any ItemProcessor used should be implemented in a way that is idempotent

    This means that in Michael's example, the first time a user is Processed the user is cached in the Set and if there is a failure Writing the item, if the step is fault tolerance the Processor will be executed again for the same User and this Filter will filter out the user.

    Improved code:

    /**
     * This implementation assumes that there is enough room in memory to store the duplicate
     * Users.  Otherwise, you'd want to store them somewhere you can do a look-up on.
     */
    public class UserFilterItemProcessor implements ItemProcessor {
    
        // This assumes that User.equals() identifies the duplicates
        private Set seenUsers = new HashSet();
    
        public User process(User user) {
            if(seenUsers.contains(user) && !user.hasBeenProcessed()) {
                return null;
            } else {
                seenUsers.add(user);
                user.setProcessed(true);
                return user;
            }
        }
    }
    

提交回复
热议问题