How to get multiple values from an object using a single stream operation?

前端 未结 5 1290
梦如初夏
梦如初夏 2020-12-10 05:45

I want to determine the minimum area required to display a collection of points. The easy way is to loop through the collection like this:

int minX = Integer         


        
相关标签:
5条回答
  • 2020-12-10 06:24

    You are able to use 2 Streams using Stream::reduce to get a point with a minimum and a point with a maximum. I don't recommend to concatenate the results to a one stream since there might be hard to distinguish the difference between the minimum, maximum and the coordinates.

    Point min = points
        .stream()
        .reduce((l, r) -> new Point(Math.min(l.y, r.y), Math.min(l.y, r.y))
        .orElse(new Point(-1, -1));
    
    Point max = points
        .stream()
        .reduce((l, r) -> new Point(Math.max(l.y, r.y), Math.max(l.y, r.y))
        .orElse(new Point(-1, -1));
    

    As the BinaryOperator<Point> use the two subsequent Points and a ternary operator to find out the minimum/maximum which is passed to a new object Point and returned using Optional::orElse with the default -1, -1 coordinates.

    0 讨论(0)
  • 2020-12-10 06:31

    By analogy with IntSummaryStatistics, create a class PointStatistics which collects the information you need. It defines two methods: one for recording values from a Point, one for combining two Statistics.

    class PointStatistics {
        private int minX = Integer.MAX_VALUE;
        private int maxX = Integer.MIN_VALUE;
    
        private int minY = Integer.MAX_VALUE;
        private int maxY = Integer.MIN_VALUE;
    
        public void accept(Point p) {
            minX = Math.min(minX, p.x);
            maxX = Math.max(maxX, p.x);
    
            minY = Math.min(minY, p.y);
            maxY = Math.max(minY, p.y);
        }
    
        public void combine(PointStatistics o) {
            minX = Math.min(minX, o.minX);
            maxX = Math.max(maxX, o.maxX);
    
            minY = Math.min(minY, o.minY);
            maxY = Math.max(maxY, o.maxY);
        }
    
        // getters
    }
    

    Then you can collect a Stream<Point> into a PointStatistics.

    class Program {
        public static void main(String[] args) {
            List<Point> points = new ArrayList<>();
    
            // populate 'points'
    
            PointStatistics statistics = points
                        .stream()
                        .collect(PointStatistics::new, PointStatistics::accept, PointStatistics::combine);
        }
    }
    

    UPDATE

    I was completely baffled by the conclusion drawn by OP, so I decided to write JMH benchmarks.

    Benchmark settings:

    # JMH version: 1.21
    # VM version: JDK 1.8.0_171, Java HotSpot(TM) 64-Bit Server VM, 25.171-b11
    # Warmup: 1 iterations, 10 s each
    # Measurement: 10 iterations, 10 s each
    # Timeout: 10 min per iteration
    # Benchmark mode: Average time, time/op
    

    For each iteration, I was generating a shared list of random Points (new Point(random.nextInt(), random.nextInt())) of size 100K, 1M, 10M.

    The results are

    100K

    Benchmark                        Mode  Cnt  Score   Error  Units
    
    customCollector                  avgt   10  6.760 ± 0.789  ms/op
    forEach                          avgt   10  0.255 ± 0.033  ms/op
    fourStreams                      avgt   10  5.115 ± 1.149  ms/op
    statistics                       avgt   10  0.887 ± 0.114  ms/op
    twoStreams                       avgt   10  2.869 ± 0.567  ms/op
    

    1M

    Benchmark                        Mode  Cnt   Score   Error  Units
    
    customCollector                  avgt   10  68.117 ± 4.822  ms/op
    forEach                          avgt   10   3.939 ± 0.559  ms/op
    fourStreams                      avgt   10  57.800 ± 4.817  ms/op
    statistics                       avgt   10   9.904 ± 1.048  ms/op
    twoStreams                       avgt   10  32.303 ± 2.498  ms/op
    

    10M

    Benchmark                        Mode  Cnt    Score     Error  Units
    
    customCollector                  avgt   10  714.016 ± 151.558  ms/op
    forEach                          avgt   10   54.334 ±   9.820  ms/op
    fourStreams                      avgt   10  699.599 ± 138.332  ms/op
    statistics                       avgt   10  148.649 ±  26.248  ms/op
    twoStreams                       avgt   10  429.050 ±  72.879  ms/op
    
    0 讨论(0)
  • 2020-12-10 06:32

    JDK 12 will have Collectors.teeing (webrev and CSR), which collects to two different collectors and then merges both partial results into a final result.

    You could use it here to collect to two IntSummaryStatistics for both the x coordinate and the y coordinate:

    List<IntSummaryStatistics> stats = points.stream()
        .collect(Collectors.teeing(
                 Collectors.mapping(p -> p.x, Collectors.summarizingInt()),
                 Collectors.mapping(p -> p.y, Collectors.summarizingInt()),
                 List::of));
    
    int minX = stats.get(0).getMin();
    int maxX = stats.get(0).getMax();
    int minY = stats.get(1).getMin();
    int maxY = stats.get(1).getMax();
    

    Here the first collector gathers statistics for x and the second one for y. Then, statistics for both x and y are merged into a List by means of the JDK 9 List.of factory method that accepts two elements.

    An aternative to List::of for the merger would be:

    (xStats, yStats) -> Arrays.asList(xStats, yStats)
    

    If you happen to not have JDK 12 installed on your machine, here's a simplified, generic version of the teeing method that you can safely use as an utility method:

    public static <T, A1, A2, R1, R2, R> Collector<T, ?, R> teeing(
            Collector<? super T, A1, R1> downstream1,
            Collector<? super T, A2, R2> downstream2,
            BiFunction<? super R1, ? super R2, R> merger) {
    
        class Acc {
            A1 acc1 = downstream1.supplier().get();
            A2 acc2 = downstream2.supplier().get();
    
            void accumulate(T t) {
                downstream1.accumulator().accept(acc1, t);
                downstream2.accumulator().accept(acc2, t);
            }
    
            Acc combine(Acc other) {
                acc1 = downstream1.combiner().apply(acc1, other.acc1);
                acc2 = downstream2.combiner().apply(acc2, other.acc2);
                return this;
            }
    
            R applyMerger() {
                R1 r1 = downstream1.finisher().apply(acc1);
                R2 r2 = downstream2.finisher().apply(acc2);
                return merger.apply(r1, r2);
            }
        }
    
        return Collector.of(Acc::new, Acc::accumulate, Acc::combine, Acc::applyMerger);
    }
    

    Please note that I'm not considering the characteristics of the downstream collectors when creating the returned collector.

    0 讨论(0)
  • 2020-12-10 06:38

    You could divide by two the iterations with summaryStatistics() while keeping a straight code :

    IntSummaryStatistics stat = points.stream().mapToInt(point -> point.x).summaryStatistics();
    int minX = stat.getMin();
    int maxX = stat.getMax();
    

    And do the same thing with point.y.
    You could factor out in this way :

    Function<ToIntFunction<Point>, IntSummaryStatistics> statFunction =
            intExtractor -> points.stream()
                                  .mapToInt(p -> intExtractor.applyAsInt(pp))
                                  .summaryStatistics();
    
    IntSummaryStatistics statX = statFunction.apply(p -> p.x);
    IntSummaryStatistics statY = statFunction.apply(p -> p.y);
    

    A custom collector is a possibility but note that you should implement the combiner part that will make your code harder to read.
    So but if you need to use parallel stream you should stay to the imperative way.
    While you could improve your actual code by relying the Math.min and Math.max functions :

    for (Point p : points) {
        minX = Math.min(p.x, minX);
        minY = Math.min(p.y, minY);
        maxY = Math.max(p.x, maxX);
        maxY = Math.max(p.y, maxY);
    }
    
    0 讨论(0)
  • 2020-12-10 06:45

    Thanks everyone for all the suggestions and answers. This is very helpful and I learned a lot!

    I decided to give most of your solutions a go (except for the JDK12 solution). For some of them you already provide me with the code. In addition, I made my own Collector.

    class extremesCollector implements Collector<Point, Map<String, Integer>, Map<String , Integer>> {
    
        @Override
        public Supplier<Map<String, Integer>> supplier() {
            Map<String, Integer> map = new HashMap<>();
            map.put("xMin", Integer.MAX_VALUE);
            map.put("yMin", Integer.MAX_VALUE);
            map.put("xMax", Integer.MIN_VALUE);
            map.put("yMax", Integer.MIN_VALUE);
            return () -> map;
        }
    
        @Override
        public BiConsumer<Map<String, Integer>, Point> accumulator() {
            return (a, b) -> {
                a.put("xMin", Math.min(a.get("xMin"), b.x));
                a.put("yMin", Math.min(a.get("yMin"), b.y));
                a.put("xMax", Math.max(a.get("xMax"), b.x));
                a.put("yMax", Math.max(a.get("yMax"), b.y));
            };
        }
    
        @Override
        public Function<Map<String, Integer>, Map<String, Integer>> finisher() {
            return Function.identity();
        }
    
        @Override
        public BinaryOperator<Map<String, Integer>> combiner() {
            return (a, b) -> {
                a.put("xMin", Math.min(a.get("xMin"), b.get("xMin")));
                a.put("yMin", Math.min(a.get("yMin"), b.get("yMin")));
                a.put("xMax", Math.max(a.get("xMax"), b.get("xMax")));
                a.put("yMax", Math.max(a.get("yMax"), b.get("yMax")));
                return a;
            };
        }
    
        @Override
        public Set<Characteristics> characteristics() {
            Set<Characteristics> characteristics = new HashSet<>();
            characteristics.add(Characteristics.UNORDERED);
            characteristics.add(Characteristics.CONCURRENT);
            characteristics.add(Characteristics.IDENTITY_FINISH);
            return characteristics;
        }
    }
    

    Results

    I gave all of them a try and compared the results. Good news: for all of them, I got the same result as far as the values are concerned!

    With respect to the speed, here is the ranking:

    1. for-loops
    2. four individual streams
    3. stream with self-made Collector
    4. parallel stream with self-made Collector
    5. statistics approach provided by Andrew Tobilko

    Number 2 and 3 are actually very close in terms of speed. The parallel version is probably slower because my dataset is too small.

    0 讨论(0)
提交回复
热议问题