How to increase the performance of FlatFileItemReader in SpringBatch?

问题

I am writing spring batch which reads from flat file, do little processing and write the summary to the output file. My processor and writer are relatively quicker compared to reader. I am using FlatFileItemReader and tried with wide range of commit intervals starting from 50-1000. My batch job has to process 10 millions records on a faster rate. Kindly let me know the ways to improve the speed rate of FlatFileItemReader. pasting below my config file and my Mapper class read the fieldset and set the values to POJO bean. Thanks a lot in advance.

BatchFileConfig.xml

<!-- Flat File Item Reader and its dependency configuration starts here -->
<bean id="flatFileReader" class="org.springframework.batch.item.file.FlatFileItemReader">
    <property name="resource" value="classpath:flatfiles/input_10KFile.txt" />
    <property name="encoding" value="UTF-8" />
    <property name="linesToSkip" value="1" />
    <property name="lineMapper">
        <bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
            <property name="lineTokenizer">
                <bean
                    class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
                    <property name="names"
                        value="var1,var2,var3,var4,var5,var6" />
                    <property name="delimiter" value="&#009;" />
                    <property name="strict" value="false" />
                </bean>
            </property>
            <property name="fieldSetMapper" ref="companyMapper">
            </property>
        </bean>
    </property>
</bean>

CompanyMapper.java

 public Company mapFieldSet(FieldSet fieldSet) throws BindException {
    logger.warn("Start time is "+System.currentTimeMillis());
    if (fieldSet != null) {
    Company company = new Company();
    company.setvar1(fieldSet.readString("var1"));
    company.setvar2(fieldSet.readInt("var2"));
    company.setvar3(fieldSet.readString("var3"));
    company.setvar4(fieldSet.readInt("var4"));
    company.setvar5(fieldSet.readInt("var5"));
    company.setvar6(fieldSet.readInt("var6"));
    return company;
    }
    return null;
}

回答1:

I think you can't speed-up process a lot :/ CompanyMapper is already a custom implementation so you can think:

write a custom LineTokinizer + FieldSet couple to avoid a lot of (useful) check and error handling
write a custom BufferedReaderFactory to create your own implementation of BufferedReader that wraps a custom (and faster) InputStream implementation (look Google for that)

回答2:

I guess since you are speaking about 10 million data I would suggest you to use spring batchs scaling features. I recently did implementation for posting 5-8 million data to db. To get performance I split the file to say 1 million using File channel( being fast read/write) and then using partitioning I read each file of 1 million in my slave step using a separate thread. Though you might not get good performance difference for small data but to data of this magnitude makes a huge difference. And also as suggested by @M. Deinum try to remove the Logging. It would slow down for sure.

回答3:

Hello the ultimate way to speed up your reader is to read your file in-memory. Provided that you have enough memory you can read it at once. Provided that you don't you can read as much as you can. Once you have it in memory , you need to implement your "resource" that instead to the physical file will point to the in memory content of your file. Provided that the modern Harddisk speed exceeds 500mg per second even huge file will be read entirely in memory for couple of seconds.

Once it is in memory all your operations will run order of magnitude faster. Also this will provide you with linear capabilities of scaling if you wish so.

If you have your content in memory you can easily paralelise the work without forming bottleneck around your hard disk.

来源：https://stackoverflow.com/questions/20243629/how-to-increase-the-performance-of-flatfileitemreader-in-springbatch

标签

performance

batch-processing

spring-batch