Why don't primitive Stream have collect(Collector)?

后端未结

关注

 5  1381

-上瘾入骨i 2020-12-14 06:36

I\'m writing a library for novice programmers so I\'m trying to keep the API as clean as possible.

One of the things my Library needs to do is perform some complex c

5条回答

北荒 (楼主)

2020-12-14 07:18

Mr. Geotz provided the definitive answer for why the decision was made not to include specialized Collectors, however, I wanted to further investigate how much this decision affected performance.

I thought I would post my results as an answer.

I used the jmh microbenchmark framework to time how long it takes to compute calculations using both kinds of Collectors over collections of sizes 1, 100, 1000, 100,000 and 1 million:

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Thread)
public class MyBenchmark {

@Param({"1", "100", "1000", "100000", "1000000"})
public int size;

List seqs;

@Setup
public void setup(){
    seqs = new ArrayList(size);
    Random rand = new Random();
    for(int i=0; i< size; i++){
        //these lengths are random but over 128 so no caching of Longs
        seqs.add(BusinessObjFactory.createOfRandomLength());
    }
}
@Benchmark
public double objectCollector() {       

    return seqs.stream()
                .map(BusinessObj::getLength)
                .collect(MyUtil.myCalcLongCollector())
                .getAsDouble();
}

@Benchmark
public double primitiveCollector() {

    LongStream stream= seqs.stream()
                                    .mapToLong(BusinessObj::getLength);
    return MyUtil.myCalc(stream)        
                        .getAsDouble();
}

public static void main(String[] args) throws RunnerException{
    Options opt = new OptionsBuilder()
                        .include(MyBenchmark.class.getSimpleName())
                        .build();

    new Runner(opt).run();
}

}

Here are the results:

# JMH 1.9.3 (released 4 days ago)
# VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0_31.jdk/Contents/Home/jre/bin/java
# VM options: 
# Warmup: 20 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.sample.MyBenchmark.objectCollector

# Run complete. Total time: 01:30:31

Benchmark                        (size)  Mode  Cnt          Score         Error  Units
MyBenchmark.objectCollector           1  avgt  200        140.803 ±       1.425  ns/op
MyBenchmark.objectCollector         100  avgt  200       5775.294 ±      67.871  ns/op
MyBenchmark.objectCollector        1000  avgt  200      70440.488 ±    1023.177  ns/op
MyBenchmark.objectCollector      100000  avgt  200   10292595.233 ±  101036.563  ns/op
MyBenchmark.objectCollector     1000000  avgt  200  100147057.376 ±  979662.707  ns/op
MyBenchmark.primitiveCollector        1  avgt  200        140.971 ±       1.382  ns/op
MyBenchmark.primitiveCollector      100  avgt  200       4654.527 ±      87.101  ns/op
MyBenchmark.primitiveCollector     1000  avgt  200      60929.398 ±    1127.517  ns/op
MyBenchmark.primitiveCollector   100000  avgt  200    9784655.013 ±  113339.448  ns/op
MyBenchmark.primitiveCollector  1000000  avgt  200   94822089.334 ± 1031475.051  ns/op

As you can see, the primitive Stream version is slightly faster, but even when there are 1 million elements in the collection, it is only 0.05 seconds faster (on average).

For my API I would rather keep to the cleaner Object Stream conventions and use the Boxed version since it is such a minor performance penalty.

Thanks to everyone who shed insight into this issue.

0 讨论(0)

查看其它5个回答