I\'m writing a library for novice programmers so I\'m trying to keep the API as clean as possible.
One of the things my Library needs to do is perform some complex c
Mr. Geotz provided the definitive answer for why the decision was made not to include specialized Collectors, however, I wanted to further investigate how much this decision affected performance.
I thought I would post my results as an answer.
I used the jmh microbenchmark framework to time how long it takes to compute calculations using both kinds of Collectors over collections of sizes 1, 100, 1000, 100,000 and 1 million:
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Thread)
public class MyBenchmark {
@Param({"1", "100", "1000", "100000", "1000000"})
public int size;
List seqs;
@Setup
public void setup(){
seqs = new ArrayList(size);
Random rand = new Random();
for(int i=0; i< size; i++){
//these lengths are random but over 128 so no caching of Longs
seqs.add(BusinessObjFactory.createOfRandomLength());
}
}
@Benchmark
public double objectCollector() {
return seqs.stream()
.map(BusinessObj::getLength)
.collect(MyUtil.myCalcLongCollector())
.getAsDouble();
}
@Benchmark
public double primitiveCollector() {
LongStream stream= seqs.stream()
.mapToLong(BusinessObj::getLength);
return MyUtil.myCalc(stream)
.getAsDouble();
}
public static void main(String[] args) throws RunnerException{
Options opt = new OptionsBuilder()
.include(MyBenchmark.class.getSimpleName())
.build();
new Runner(opt).run();
}
}
Here are the results:
# JMH 1.9.3 (released 4 days ago)
# VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0_31.jdk/Contents/Home/jre/bin/java
# VM options:
# Warmup: 20 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.sample.MyBenchmark.objectCollector
# Run complete. Total time: 01:30:31
Benchmark (size) Mode Cnt Score Error Units
MyBenchmark.objectCollector 1 avgt 200 140.803 ± 1.425 ns/op
MyBenchmark.objectCollector 100 avgt 200 5775.294 ± 67.871 ns/op
MyBenchmark.objectCollector 1000 avgt 200 70440.488 ± 1023.177 ns/op
MyBenchmark.objectCollector 100000 avgt 200 10292595.233 ± 101036.563 ns/op
MyBenchmark.objectCollector 1000000 avgt 200 100147057.376 ± 979662.707 ns/op
MyBenchmark.primitiveCollector 1 avgt 200 140.971 ± 1.382 ns/op
MyBenchmark.primitiveCollector 100 avgt 200 4654.527 ± 87.101 ns/op
MyBenchmark.primitiveCollector 1000 avgt 200 60929.398 ± 1127.517 ns/op
MyBenchmark.primitiveCollector 100000 avgt 200 9784655.013 ± 113339.448 ns/op
MyBenchmark.primitiveCollector 1000000 avgt 200 94822089.334 ± 1031475.051 ns/op
As you can see, the primitive Stream version is slightly faster, but even when there are 1 million elements in the collection, it is only 0.05 seconds faster (on average).
For my API I would rather keep to the cleaner Object Stream conventions and use the Boxed version since it is such a minor performance penalty.
Thanks to everyone who shed insight into this issue.