Java 8 Stream with batch processing

前端 未结 15 879
醉梦人生
醉梦人生 2020-11-28 02:42

I have a large file that contains a list of items.

I would like to create a batch of items, make an HTTP request with this batch (all of the items are needed as par

相关标签:
15条回答
  • 2020-11-28 03:18

    You can also use RxJava:

    Observable.from(data).buffer(BATCH_SIZE).forEach((batch) -> process(batch));
    

    or

    Observable.from(lazyFileStream).buffer(500).map((batch) -> process(batch)).toList();
    

    or

    Observable.from(lazyFileStream).buffer(500).map(MyClass::process).toList();
    
    0 讨论(0)
  • 2020-11-28 03:18

    Simple example using Spliterator

        // read file into stream, try-with-resources
        try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
            //skip header
            Spliterator<String> split = stream.skip(1).spliterator();
            Chunker<String> chunker = new Chunker<String>();
            while(true) {              
                boolean more = split.tryAdvance(chunker::doSomething);
                if (!more) {
                    break;
                }
            }           
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    
    static class Chunker<T> {
        int ct = 0;
        public void doSomething(T line) {
            System.out.println(ct++ + " " + line.toString());
            if (ct % 100 == 0) {
                System.out.println("====================chunk=====================");               
            }           
        }       
    }
    

    Bruce's answer is more comprehensive, but I was looking for something quick and dirty to process a bunch of files.

    0 讨论(0)
  • 2020-11-28 03:19

    Pure Java-8 implementation is also possible:

    int BATCH = 500;
    IntStream.range(0, (data.size()+BATCH-1)/BATCH)
             .mapToObj(i -> data.subList(i*BATCH, Math.min(data.size(), (i+1)*BATCH)))
             .forEach(batch -> process(batch));
    

    Note that unlike JOOl it can work nicely in parallel (provided that your data is a random access list).

    0 讨论(0)
  • 2020-11-28 03:24

    In all fairness, take a look at the elegant Vavr solution:

    Stream.ofAll(data).grouped(BATCH_SIZE).forEach(this::process);
    
    0 讨论(0)
  • 2020-11-28 03:27

    With Java 8 and com.google.common.collect.Lists, you can do something like:

    public class BatchProcessingUtil {
        public static <T,U> List<U> process(List<T> data, int batchSize, Function<List<T>, List<U>> processFunction) {
            List<List<T>> batches = Lists.partition(data, batchSize);
            return batches.stream()
                    .map(processFunction) // Send each batch to the process function
                    .flatMap(Collection::stream) // flat results to gather them in 1 stream
                    .collect(Collectors.toList());
        }
    }
    

    In here T is the type of the items in the input list and U the type of the items in the output list

    And You can use it like this:

    List<String> userKeys = [... list of user keys]
    List<Users> users = BatchProcessingUtil.process(
        userKeys,
        10, // Batch Size
        partialKeys -> service.getUsers(partialKeys)
    );
    
    0 讨论(0)
  • 2020-11-28 03:28

    this is a pure java solution that's evaluated lazily.

    public static <T> Stream<List<T>> partition(Stream<T> stream, int batchSize){
        List<List<T>> currentBatch = new ArrayList<List<T>>(); //just to make it mutable 
        currentBatch.add(new ArrayList<T>(batchSize));
        return Stream.concat(stream
          .sequential()                   
          .map(new Function<T, List<T>>(){
              public List<T> apply(T t){
                  currentBatch.get(0).add(t);
                  return currentBatch.get(0).size() == batchSize ? currentBatch.set(0,new ArrayList<>(batchSize)): null;
                }
          }), Stream.generate(()->currentBatch.get(0).isEmpty()?null:currentBatch.get(0))
                    .limit(1)
        ).filter(Objects::nonNull);
    }
    
    0 讨论(0)
提交回复
热议问题