Java 8 Stream with batch processing

前端 未结 15 876
醉梦人生
醉梦人生 2020-11-28 02:42

I have a large file that contains a list of items.

I would like to create a batch of items, make an HTTP request with this batch (all of the items are needed as par

15条回答
  •  难免孤独
    2020-11-28 03:09

    Pure Java 8 solution:

    We can create a custom collector to do this elegantly, which takes in a batch size and a Consumer to process each batch:

    import java.util.ArrayList;
    import java.util.Collections;
    import java.util.List;
    import java.util.Set;
    import java.util.function.*;
    import java.util.stream.Collector;
    
    import static java.util.Objects.requireNonNull;
    
    
    /**
     * Collects elements in the stream and calls the supplied batch processor
     * after the configured batch size is reached.
     *
     * In case of a parallel stream, the batch processor may be called with
     * elements less than the batch size.
     *
     * The elements are not kept in memory, and the final result will be an
     * empty list.
     *
     * @param  Type of the elements being collected
     */
    class BatchCollector implements Collector, List> {
    
        private final int batchSize;
        private final Consumer> batchProcessor;
    
    
        /**
         * Constructs the batch collector
         *
         * @param batchSize the batch size after which the batchProcessor should be called
         * @param batchProcessor the batch processor which accepts batches of records to process
         */
        BatchCollector(int batchSize, Consumer> batchProcessor) {
            batchProcessor = requireNonNull(batchProcessor);
    
            this.batchSize = batchSize;
            this.batchProcessor = batchProcessor;
        }
    
        public Supplier> supplier() {
            return ArrayList::new;
        }
    
        public BiConsumer, T> accumulator() {
            return (ts, t) -> {
                ts.add(t);
                if (ts.size() >= batchSize) {
                    batchProcessor.accept(ts);
                    ts.clear();
                }
            };
        }
    
        public BinaryOperator> combiner() {
            return (ts, ots) -> {
                // process each parallel list without checking for batch size
                // avoids adding all elements of one to another
                // can be modified if a strict batching mode is required
                batchProcessor.accept(ts);
                batchProcessor.accept(ots);
                return Collections.emptyList();
            };
        }
    
        public Function, List> finisher() {
            return ts -> {
                batchProcessor.accept(ts);
                return Collections.emptyList();
            };
        }
    
        public Set characteristics() {
            return Collections.emptySet();
        }
    }
    

    Optionally then create a helper utility class:

    import java.util.List;
    import java.util.function.Consumer;
    import java.util.stream.Collector;
    
    public class StreamUtils {
    
        /**
         * Creates a new batch collector
         * @param batchSize the batch size after which the batchProcessor should be called
         * @param batchProcessor the batch processor which accepts batches of records to process
         * @param  the type of elements being processed
         * @return a batch collector instance
         */
        public static  Collector, List> batchCollector(int batchSize, Consumer> batchProcessor) {
            return new BatchCollector(batchSize, batchProcessor);
        }
    }
    

    Example usage:

    List input = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
    List output = new ArrayList<>();
    
    int batchSize = 3;
    Consumer> batchProcessor = xs -> output.addAll(xs);
    
    input.stream()
         .collect(StreamUtils.batchCollector(batchSize, batchProcessor));
    

    I've posted my code on GitHub as well, if anyone wants to take a look:

    Link to Github

提交回复
热议问题