Spring Batch : Aggregating records and write count

问题

We have some data coming in the flat file. e.g.

EmpCode,Salary,EmpName,...  
100,1000,...,...
200,2000,...,...
200,2000,...,...
100,1000,...,...
300,3000,...,...
400,4000,...,...

We would like to aggregate the salary based on the EmpCode and write to the database as

Emp_Code    Emp_Salary   Updated_Time   Updated_User 
100         2000         ...            ...
200         4000         ...            ...
300         3000         ...            ...
400         4000         ...            ...

I have written classes as per the Spring Batch as follows

ItemReader - to read the employee data into a Employee object

A sample EmployeeItemProcessor:

public class EmployeeProcessor implements ItemProcessor<Employee, Employee> {

    @Override
    public Employee process(Employee employee) throws Exception {
        employee.setUpdatedTime(new Date());
        employee.setUpdatedUser("someuser");
        return employee;
    }

EmployeeItemWriter:

@Repository
public class EmployeeItemWriter implements ItemWriter<Employee> { 
 @Autowired
 private SessionFactory sf;

 @Override  
 public void write(List<? extends Employee> employeeList) throws Exception {  
  List<Employee> aggEmployeeList = aggregateEmpData(employeeList);
  //write to db using session factory
 }  

 private List<Employee> aggregateEmpData(List<? extends Employee> employeeList){
     Map<String, Employee> map = new HashMap<String, Employee>(); 
    for(Employee e: employeeList){
        String empCode =  e.getEmpCode();
        if(map.containsKey(empCode)){
            //get employee salary and add up
         }else{
          map.put(empCode,Employee);
         }
     }    
     return new ArrayList<Employee>(map.values());         
 }
}

XML Configuration

...
<batch:job id="employeeJob">
    <batch:step id="step1">
    <batch:tasklet>
        <batch:chunk reader="employeeItemReader" 
            writer="employeeItemWriter" processor="employeeItemProcessor"
            commit-interval="100">
        </batch:chunk>
    </batch:tasklet>
    </batch:step>
  </batch:job>
...

It is working and serving my purpose. However, I have a couple of questions.

1) When I look at the logs, it is showing as below(commit-interval=100):

status=COMPLETED, exitStatus=COMPLETED, readCount=2652, filterCount=0, writeCount=2652 readSkipCount=0, writeSkipCount=0, processSkipCount=0, commitCount=27, rollbackCount=0

But after aggregation, only 2515 records were written to the database. The write count is 2652. Is it because the number of items reaching ItemWriter are still 2652? How can this be corrected?

2) We are iterating through the list twice.Once in ItemProcessor and then in ItemWriter for aggregation. It could be a performance problem if, the number of records are higher. Is there any better way to achieve this?

回答1:

Why do the aggregation in the ItemWriter? I'd do it in an ItemProcessor. This would allow the write count to be accurate and separates that component from the act of actual writing. If you provide some insight into your configuration, we could elaborate more.

回答2:

If each line of input file, is an employee object, so your ReadCount would be number of lines in input file. WriteCount would be summation of size of all the lists passed to item writer. So, maybe your aggregateEmpData function removes or aggregates some records into one and hence, your db count is not the same as WriteCount. If you want to make sure that WriteCount is exactly number of records in db, you should do your aggregate in processor.

回答3:

I managed to write it. I did it as follows.

public class EmployeeProcessor implements ItemProcessor<Employee, Employee> {
    Map<String, Employee> map;
    @Override
    public Employee process(Employee employee) throws Exception {
        employee.setUpdatedTime(new Date());
        employee.setUpdatedUser("someuser");
        String empCode =  employee.getEmpCode();
        if(map.containsKey(empCode)){
            //get employee salary and add up
            return null; 
         }
         map.put(empCode,employee);
         return employee;
    }

    @BeforeStep
    public void beforeStep(StepExecution stepExecution) {
         map = new HashMap<String, Employee>(); 
    }

The write count is appearing correctly now.

来源：https://stackoverflow.com/questions/33825535/spring-batch-aggregating-records-and-write-count

标签

Spring

spring-batch