Step initialization time too long using Partitioner in Spring-Batch?

那年仲夏 提交于 2019-12-13 07:25:46

问题


I'm using Partitioner to parallelize the import of *.csv files. There are about 30k files in the folder.

Problem: the job initialization takes about 1-2h hours until all files are set up. The bottleneck is in SimpleStepExecutionSplitter.split().

Question: is it normal that the step initializations require that much time? Or could I improve it somehow?

@Bean
public Step partitionStep(Partitioner partitioner) {
    return stepBuilderFactory.get("partitionStep")
            .partitioner(step())
            .partitioner("partitioner", partitioner)
            .taskExecutor(taskExecutor())
            .build();
}

@Bean
public TaskExecutor taskExecutor() {
    ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
    taskExecutor.setCorePoolSize(4); //run import always with 4 parallel files
    taskExecutor.setMaxPoolSize(4);
    taskExecutor.afterPropertiesSet();
    return taskExecutor;
}


@Bean
public Partitioner partitioner() throws IOException {
    MultiResourcePartitioner p = new MultiResourcePartitioner();
    p.setResources(new PathMatchingResourcePatternResolver().getResources("mypath/*.csv"));
    return p;
}

回答1:


MultiResourcePartitioner creates a partition for each resource. Partition creation process in itself is very fast ( i.e. partitioner returns the executioncontext map very fast) but Spring Batch takes huge time in populating corresponding meta data DB tables and it becomes terribly slow once number of partitions goes beyond 100 ( this is all my personal experience).

As per only answer here, they did some improvements but I am using latest version and its very slow for partitions more than 100.

See this too.

I think, you don't have much of a choice other than reducing number of partitions unless you are ready to rewrite a bunch of API code by yourself.




回答2:


I use a custom splitter because in the default splitter (https://github.com/spring-projects/spring-batch/blob/master/spring-batch-core/src/main/java/org/springframework/batch/core/partition/support/SimpleStepExecutionSplitter.java) , you call jobRepository.getLastStepExecution for each StepExecution. I don't use restartability with spring-batch, so i can write my own splitter. Now step initialization takes few seconds for thousand of files (before it was few minutes)



来源:https://stackoverflow.com/questions/43800324/step-initialization-time-too-long-using-partitioner-in-spring-batch

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!