Spring Batch multiple process for heavy load with multiple thread under every process

戏子无情 提交于 2021-02-17 06:43:43

问题


I have a scenario where I need to have roughly 50-60 different process running concurrently and executing a task.

Every process must fetch the data from DB using a sql query by passing a value and fetching data to be run against in the subsequent task. select col_1, col_2, col_3 from table_1 where col_1 = :Process_1;

 @Bean
    public Job partitioningJob() throws Exception {
        return jobBuilderFactory.get("parallelJob")
                .incrementer(new RunIdIncrementer())
                .flow(masterStep())
                .end()
                .build();
    }

    @Bean
    public Step masterStep() throws Exception {
        //How to fetch data from configuration and pass all values in partitioner one by one.
        // Can we give the name for every process so that it is helpful in logs and monitoring.
        return stepBuilderFactory.get("masterStep")
                .partitioner(slaveStep())
                .partitioner("partition", partitioner())
                .gridSize(10)
                .taskExecutor(new SimpleAsyncTaskExecutor())
                .build();
    }

    @Bean
    public Partitioner partitioner() throws Exception {
        //Hit DB with sql query and fetch the data.

    }

    @Bean
    public Step slaveStep() throws Exception {
        return stepBuilderFactory.get("slaveStep")
                .<Map<String, String>, Map<String, String>>chunk(1)
                .processTask()
                .build();
    }

As we have Aggregator and parallelProcessing in Apache Camel, does Spring Batch has any similar feature which does the same job?

I am new to Spring Batch and currently exploring whether it can handle the volume. As this would be a heavy loaded application running 24*7 and every process needs to run concurrently where every thread should be able to support multiple threads inside a process.

Is there a way to monitor these processes so that it it gets terminated anyhow, I should be able to restart that particular process? Kindly help to give some solution to this problem.


回答1:


Please find the answers of above questions.

  1. parallelProcessing - Local and Remote partition supports parallel processing and can handle huge number of volumes as we are currently handling 200 to 300 million data per day.

  2. Is it can handle the volume - Yes, this can handle huge volumes and is well proven.

  3. Every process needs to run concurrently where every thread should be able to support multiple threads inside a process - Spring batch will take care based on your ThreadPool. Make sure you configure the pool based on System resources.

  4. Is there a way to monitor these processes so that it it gets terminated - Yes . Each parallel process of partition is a step and you can monitor in BATCH_STEP_EXECUTION and have all the details

  5. Should be able to restart that particular process - Yes this is a built in feature and restart from failed step . Huge volume jobs we always use Fault tolerance so that rejections will process later. This is also built in feature.

Example project below

https://github.com/ngecom/springBatchLocalParition/tree/master

Database added - H2 and create table available in resource folder . We always prefer to use Data source pooling and pool size will be greater than your thread pool size.

Summary of the example project

  1. Read from table "customer" and divide into step partitions
  2. Each step partition write to new table "new_customer"
  3. Thread pool config available in JobConfiguration.java method name "taskExecutor()"
  4. Chunk size available in slaveStep().
  5. You can calculate memory size based on your parallel steps and configure as VM max memory.

Query help you analyze based on your above questions after executing

SELECT * FROM NEW_CUSTOMER;   
SELECT * FROM BATCH_JOB_EXECUTION bje;
SELECT * FROM BATCH_STEP_EXECUTION bse WHERE JOB_EXECUTION_ID=2; 
SELECT * FROM BATCH_STEP_EXECUTION_CONTEXT bsec WHERE STEP_EXECUTION_ID=4; 

If you want to change to MYSQL add below as datasource

spring.datasource.hikari.minimum-idle=5 
spring.datasource.hikari.maximum-pool-size=100
spring.datasource.hikari.idle-timeout=600000 
spring.datasource.hikari.max-lifetime=1800000 
spring.datasource.hikari.auto-commit=true 
spring.datasource.hikari.poolName=SpringBoot-HikariCP
spring.datasource.url=jdbc:mysql://localhost:3306/ngecomdev
spring.datasource.username=ngecom
spring.datasource.password=ngbilling

Please refer always to below guthub URL. You will get lot ideas from this.

https://github.com/spring-projects/spring-batch/tree/master/spring-batch-samples



来源:https://stackoverflow.com/questions/66029567/spring-batch-multiple-process-for-heavy-load-with-multiple-thread-under-every-pr

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!