问题
We have some data coming in the flat file. e.g.
EmpCode,Salary,EmpName,...
100,1000,...,...
200,2000,...,...
200,2000,...,...
100,1000,...,...
300,3000,...,...
400,4000,...,...
We would like to aggregate the salary based on the EmpCode and write to the database as
Emp_Code Emp_Salary Updated_Time Updated_User
100 2000 ... ...
200 4000 ... ...
300 3000 ... ...
400 4000 ... ...
I have written classes as per the Spring Batch as follows
ItemReader - to read the employee data into a Employee object
A sample EmployeeItemProcessor:
public class EmployeeProcessor implements ItemProcessor<Employee, Employee> {
@Override
public Employee process(Employee employee) throws Exception {
employee.setUpdatedTime(new Date());
employee.setUpdatedUser("someuser");
return employee;
}
EmployeeItemWriter:
@Repository
public class EmployeeItemWriter implements ItemWriter<Employee> {
@Autowired
private SessionFactory sf;
@Override
public void write(List<? extends Employee> employeeList) throws Exception {
List<Employee> aggEmployeeList = aggregateEmpData(employeeList);
//write to db using session factory
}
private List<Employee> aggregateEmpData(List<? extends Employee> employeeList){
Map<String, Employee> map = new HashMap<String, Employee>();
for(Employee e: employeeList){
String empCode = e.getEmpCode();
if(map.containsKey(empCode)){
//get employee salary and add up
}else{
map.put(empCode,Employee);
}
}
return new ArrayList<Employee>(map.values());
}
}
XML Configuration
...
<batch:job id="employeeJob">
<batch:step id="step1">
<batch:tasklet>
<batch:chunk reader="employeeItemReader"
writer="employeeItemWriter" processor="employeeItemProcessor"
commit-interval="100">
</batch:chunk>
</batch:tasklet>
</batch:step>
</batch:job>
...
It is working and serving my purpose. However, I have a couple of questions.
1) When I look at the logs, it is showing as below(commit-interval=100):
status=COMPLETED, exitStatus=COMPLETED, readCount=2652, filterCount=0, writeCount=2652 readSkipCount=0, writeSkipCount=0, processSkipCount=0, commitCount=27, rollbackCount=0
But after aggregation, only 2515 records were written to the database. The write count is 2652. Is it because the number of items reaching ItemWriter are still 2652? How can this be corrected?
2) We are iterating through the list twice.Once in ItemProcessor and then in ItemWriter for aggregation. It could be a performance problem if, the number of records are higher. Is there any better way to achieve this?
回答1:
Why do the aggregation in the ItemWriter? I'd do it in an ItemProcessor. This would allow the write count to be accurate and separates that component from the act of actual writing. If you provide some insight into your configuration, we could elaborate more.
回答2:
If each line of input file, is an employee object, so your ReadCount would be number of lines in input file. WriteCount would be summation of size of all the lists passed to item writer. So, maybe your aggregateEmpData function removes or aggregates some records into one and hence, your db count is not the same as WriteCount. If you want to make sure that WriteCount is exactly number of records in db, you should do your aggregate in processor.
回答3:
I managed to write it. I did it as follows.
public class EmployeeProcessor implements ItemProcessor<Employee, Employee> {
Map<String, Employee> map;
@Override
public Employee process(Employee employee) throws Exception {
employee.setUpdatedTime(new Date());
employee.setUpdatedUser("someuser");
String empCode = employee.getEmpCode();
if(map.containsKey(empCode)){
//get employee salary and add up
return null;
}
map.put(empCode,employee);
return employee;
}
@BeforeStep
public void beforeStep(StepExecution stepExecution) {
map = new HashMap<String, Employee>();
}
The write count is appearing correctly now.
来源:https://stackoverflow.com/questions/33825535/spring-batch-aggregating-records-and-write-count