We are working on a latency sensitive application and have been microbenchmarking all kinds of methods (using jmh). After microbenchmarking a lookup method and being satisfied
TL;DR: You should not put BLIND trust into anything.
First things first: it is important to verify the experimental data before jumping to the conclusions from them. Just claiming something is 3x faster/slower is odd, because you really need to follow up on the reason for the performance difference, not just trust the numbers. This is especially important for nano-benchmarks like you have.
Second, the experimenters should clearly understand what they control and what they don't. In your particular example, you are returning the value from @Benchmark
methods, but can you be reasonably sure the callers outside will do the same thing for primitive and the reference? If you ask yourself this question, then you'll realize you are basically measuring the test infrastructure.
Down to the point. On my machine (i5-4210U, Linux x86_64, JDK 8u40), the test yields:
Benchmark (value) Mode Samples Score Error Units
...benchmarkReturnOrdinal 3 thrpt 5 0.876 ± 0.023 ops/ns
...benchmarkReturnOrdinal 2 thrpt 5 0.876 ± 0.009 ops/ns
...benchmarkReturnOrdinal 1 thrpt 5 0.832 ± 0.048 ops/ns
...benchmarkReturnReference 3 thrpt 5 0.292 ± 0.006 ops/ns
...benchmarkReturnReference 2 thrpt 5 0.286 ± 0.024 ops/ns
...benchmarkReturnReference 1 thrpt 5 0.293 ± 0.008 ops/ns
Okay, so reference tests appear 3x slower. But wait, it uses an old JMH (1.1.1), let's update to current latest (1.7.1):
Benchmark (value) Mode Cnt Score Error Units
...benchmarkReturnOrdinal 3 thrpt 5 0.326 ± 0.010 ops/ns
...benchmarkReturnOrdinal 2 thrpt 5 0.329 ± 0.004 ops/ns
...benchmarkReturnOrdinal 1 thrpt 5 0.329 ± 0.004 ops/ns
...benchmarkReturnReference 3 thrpt 5 0.288 ± 0.005 ops/ns
...benchmarkReturnReference 2 thrpt 5 0.288 ± 0.005 ops/ns
...benchmarkReturnReference 1 thrpt 5 0.288 ± 0.002 ops/ns
Oops, now they are only barely slower. BTW, this also tells us the test is infrastructure-bound. Okay, can we see what really happens?
If you build the benchmarks, and look around what exactly calls your @Benchmark
methods, then you'll see something like:
public void benchmarkReturnOrdinal_thrpt_jmhStub(InfraControl control, RawResults result, ReturnEnumObjectVersusPrimitiveBenchmark_jmh l_returnenumobjectversusprimitivebenchmark0_0, Blackhole_jmh l_blackhole1_1) throws Throwable {
long operations = 0;
long realTime = 0;
result.startTime = System.nanoTime();
do {
l_blackhole1_1.consume(l_longname.benchmarkReturnOrdinal());
operations++;
} while(!control.isDone);
result.stopTime = System.nanoTime();
result.realTime = realTime;
result.measuredOps = operations;
}
That l_blackhole1_1
has a consume
method, which "consumes" the values (see Blackhole
for rationale). Blackhole.consume
has overloads for references and primitives, and that alone is enough to justify the performance difference.
There is a rationale why these methods look different: they are trying to be as fast as possible for their types of argument. They do not necessarily exhibit the same performance characteristics, even though we try to match them, hence the more symmetric result with newer JMH. Now, you can even go to -prof perfasm
to see the generated code for your tests and see why the performance is different, but that's beyond the point here.
If you really want to understand how returning the primitive and/or reference differs performance-wise, you would need to enter a big scary grey zone of nuanced performance benchmarking. E.g. something like this test:
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(5)
public class PrimVsRef {
@Benchmark
public void prim() {
doPrim();
}
@Benchmark
public void ref() {
doRef();
}
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
private int doPrim() {
return 42;
}
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
private Object doRef() {
return this;
}
}
...which yields the same result for primitives and references:
Benchmark Mode Cnt Score Error Units
PrimVsRef.prim avgt 25 2.637 ± 0.017 ns/op
PrimVsRef.ref avgt 25 2.634 ± 0.005 ns/op
As I said above, these tests require following up on the reasons for the results. In this case, the generated code for both is almost the same, and that explains the result.
prim:
[Verified Entry Point]
12.69% 1.81% 0x00007f5724aec100: mov %eax,-0x14000(%rsp)
0.90% 0.74% 0x00007f5724aec107: push %rbp
0.01% 0.01% 0x00007f5724aec108: sub $0x30,%rsp
12.23% 16.00% 0x00007f5724aec10c: mov $0x2a,%eax ; load "42"
0.95% 0.97% 0x00007f5724aec111: add $0x30,%rsp
0.02% 0x00007f5724aec115: pop %rbp
37.94% 54.70% 0x00007f5724aec116: test %eax,0x10d1aee4(%rip)
0.04% 0.02% 0x00007f5724aec11c: retq
ref:
[Verified Entry Point]
13.52% 1.45% 0x00007f1887e66700: mov %eax,-0x14000(%rsp)
0.60% 0.37% 0x00007f1887e66707: push %rbp
0.02% 0x00007f1887e66708: sub $0x30,%rsp
13.63% 16.91% 0x00007f1887e6670c: mov %rsi,%rax ; load "this"
0.50% 0.49% 0x00007f1887e6670f: add $0x30,%rsp
0.01% 0x00007f1887e66713: pop %rbp
39.18% 57.65% 0x00007f1887e66714: test %eax,0xe3e78e6(%rip)
0.02% 0x00007f1887e6671a: retq
[sarcasm] See how easy it is! [/sarcasm]
The pattern is: the simpler the question, the more you have to work out to make a plausible and reliable answer.