Spring Batch - Reading a large flat file - Choices to scale horizontally?

我们两清 提交于 2019-11-28 08:43:26

Assuming "minor processing" isn't the bottle neck in the processing, the best option to scale this type of job is via partitioning. The job would have two steps. The first would split the large file into smaller files. To do this, I'd recommend using the SystemCommandTasklet to shell out to the OS to split the file (this is typically more performant than streaming the entire file through the JVM). An example of doing that would look something like this:

<bean id="fileSplittingTasklet" class="org.springframework.batch.core.step.tasklet.SystemCommandTasklet" scope="step">
    <property name="command" value="split -a 5 -l 10000 #{jobParameters['inputFile']} #{jobParameters['stagingDirectory']}"/>
    <property name="timeout" value="60000"/>
    <property name="workingDirectory" value="/tmp/input_temp"/>
</bean>

The second step would be a partitioned step. If the files are located in a place that is not shared, you'd use local partitioning. However, if the resulting files are on a network share somewhere, you can use remote partitioning. In either case, you'd use the MultiResourcePartitioner to generate a StepExecution per file. These would then be executed via the slaves (either locally running on threads or remotely listening to some messaging middleware).

One thing to note in this approach is that the order the records are processed from the original file will not be maintained.

You can see a complete remote partitioning example here: https://github.com/mminella/Spring-Batch-Talk-2.0 and a video of the talk/demo can be found here: https://www.youtube.com/watch?v=CYTj5YT7CZU

used MultiResourcePartitioner for Reading large files this worked for me

@Bean
        public Partitioner partitioner() {
            MultiResourcePartitioner partitioner = new MultiResourcePartitioner();
            ClassLoader cl = this.getClass().getClassLoader();
            ResourcePatternResolver resolver = new PathMatchingResourcePatternResolver(cl);
            Resource[] resources = resolver.getResources("file:" + filePath + "/"+"*.csv");     
            partitioner.setResources(resources);
            partitioner.partition(10);      
            return partitioner;
        }

        @Bean
        public TaskExecutor taskExecutor() {
            ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
            taskExecutor.setMaxPoolSize(4);
            taskExecutor.afterPropertiesSet();
            return taskExecutor;
        }   

        @Bean
        @Qualifier("masterStep")
        public Step masterStep() {
            return stepBuilderFactory.get("masterStep")
                    .partitioner(processDataStep())
                    .partitioner("processDataStep",partitioner()) 
                    .taskExecutor(taskExecutor())
                    .listener(listener)
                    .build();
        }


        @Bean
        @Qualifier("processData")
        public Step processData() {
            return stepBuilderFactory.get("processData")
                    .<pojo, pojo> chunk(5000)
                    .reader(reader)             
                    .processor(processor())
                    .writer(writer)         
                    .build();
        }



        @Bean(name="reader")
        @StepScope
        public FlatFileItemReader<pojo> reader(@Value("#{stepExecutionContext['fileName']}") String filename) {

            FlatFileItemReader<pojo> reader = new FlatFileItemReader<>();
            reader.setResource(new UrlResource(filename));
            reader.setLineMapper(new DefaultLineMapper<pojo>() {
                {
                    setLineTokenizer(new DelimitedLineTokenizer() {
                        {
                            setNames(FILE HEADER);


                        }
                    });
                    setFieldSetMapper(new BeanWrapperFieldSetMapper<pojo>() {
                        {
                            setTargetType(pojo.class);
                        }
                    });
                }
            });
            return reader;
        }   
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!