So I wanted to benchmark some basic java functionality to add some imformation to this question: What is the gain from declaring a method as static.
I know writing b
It looks like is the way Java is adding values to variable r
.
I've made a few changes, adding method run2()
:
public class TestPerformanceOfStaticVsDynamicCalls {
private static final long RUNS = 1_000_000_000L;
public static void main(String[] args) {
System.out.println("Test run 1 =================================");
new TestPerformanceOfStaticVsDynamicCalls().run();
System.out.println("Test run 2 =================================");
new TestPerformanceOfStaticVsDynamicCalls().run2();
}
private void run2() {
long r = 0;
long start, end;
for (int loop = 0; loop < 10; loop++) {
// Benchmark
long stat = 0;
start = System.currentTimeMillis();
for (long i = 0; i < RUNS; i++) {
stat += addStatic(1, i);
}
end = System.currentTimeMillis();
System.out.println("Static: " + (end - start) + " ms");
long dyna = 0;
start = System.currentTimeMillis();
for (long i = 0; i < RUNS; i++) {
dyna += addDynamic(1, i);
}
end = System.currentTimeMillis();
System.out.println("Dynamic: " + (end - start) + " ms");
// If you really want to have values in "r" then...
r += stat + dyna;
// Do something with r to keep compiler happy
System.out.println(r);
}
}
private void run() {
long r = 0;
long start, end;
for (int loop = 0; loop < 10; loop++) {
// Benchmark
start = System.currentTimeMillis();
for (long i = 0; i < RUNS; i++) {
r += addStatic(1, i);
}
end = System.currentTimeMillis();
System.out.println("Static: " + (end - start) + " ms");
start = System.currentTimeMillis();
for (long i = 0; i < RUNS; i++) {
r += addDynamic(1, i);
}
end = System.currentTimeMillis();
System.out.println("Dynamic: " + (end - start) + " ms");
// If you really want to have values in "r" then...
// Do something with r to keep compiler happy
System.out.println(r);
}
}
private long addDynamic(long a, long b) {
return a + b;
}
private static long addStatic(long a, long b) {
return a + b;
}
}
The results for are:
Test run 1 =================================
Static: 582 ms
Dynamic: 579 ms
1000000001000000000
Static: 2065 ms
Dynamic: 2352 ms
2000000002000000000
Static: 2084 ms
Dynamic: 2345 ms
3000000003000000000
Static: 2095 ms
Dynamic: 2347 ms
4000000004000000000
Static: 2102 ms
Dynamic: 2338 ms
5000000005000000000
Static: 2073 ms
Dynamic: 2345 ms
6000000006000000000
Static: 2074 ms
Dynamic: 2341 ms
7000000007000000000
Static: 2102 ms
Dynamic: 2355 ms
8000000008000000000
Static: 2062 ms
Dynamic: 2354 ms
9000000009000000000
Static: 2057 ms
Dynamic: 2350 ms
-8446744063709551616
Test run 2 =================================
Static: 584 ms
Dynamic: 582 ms
1000000001000000000
Static: 587 ms
Dynamic: 577 ms
2000000002000000000
Static: 577 ms
Dynamic: 579 ms
3000000003000000000
Static: 577 ms
Dynamic: 577 ms
4000000004000000000
Static: 578 ms
Dynamic: 579 ms
5000000005000000000
Static: 578 ms
Dynamic: 580 ms
6000000006000000000
Static: 577 ms
Dynamic: 579 ms
7000000007000000000
Static: 578 ms
Dynamic: 577 ms
8000000008000000000
Static: 580 ms
Dynamic: 578 ms
9000000009000000000
Static: 576 ms
Dynamic: 579 ms
-8446744063709551616
As for why adding directly to r
, I have no clue. Maybe somebody can provide more insights on why accessing r
inside the loop block
makes things much slower.
Just one additional Note. I can only observe this strange behavior if I use long
for r
and i
s. If I convert them to int
then I get these timings:
Static: 352 ms
Dynamic: 353 ms
Static: 348 ms
Dynamic: 349 ms
Static: 349 ms
Dynamic: 348 ms
Static: 349 ms
Dynamic: 344 ms
So one possible conclusion is to avoid long
in those situations. At least with Linux/Amd64 Java7 where perfomance matters.
Preamble: Writing microbenchmarks manually is almost always doomed to a failure.
There are frameworks that have already solved the common benchmarking problems.
JIT compilation unit is a method. Incorporating several benchmarks into a single method leads to unpredictable results.
JIT heavily relies on the execution profile, i.e. the run-time statistics. If a method runs the first scenario for a long time, JIT will optimize the generated code for it. When the method suddenly switches to another scenario, do not expect it to run at the same speed.
JIT may skip optimizing the code that is not executed. It will leave an uncommon trap for this code. If the trap is ever hit, JVM will deoptimize the compiled method, switch to interpreter and after that recompile the code with the new knowledge. E.g. when your method run
is compiled for the first time inside the first hot loop, JIT does not know about System.out.println
yet. As soon as the execution reaches println
, the earlier compiled code is likely to get deoptimized.
The bigger is method - the harder is to optimize it for JIT compiler. E.g. it may appear that there is not enough spare registers to hold all local variables. That's what happen in your case.
To sum it up, your benchmark seem to pass through the following scenario:
addStatic
) triggers the compilation of run
method. The execution profile does not know anything except addStatic
method.System.out.println
triggers the deoptimization and after that the second hot loop (addDynamic
) leads to recompilation of run
method.addDynamic
, so JIT optimizes the second loop, and the first loop appears to have extra register spills:Optimized loop:
0x0000000002d01054: add %rbx,%r14
0x0000000002d01057: add $0x1,%rbx ;*ladd
; - TestPerformanceOfStaticVsDynamicCalls::addDynamic@2
; - TestPerformanceOfStaticVsDynamicCalls::run@105
0x0000000002d0105b: add $0x1,%r14 ; OopMap{rbp=Oop off=127}
;*goto
; - TestPerformanceOfStaticVsDynamicCalls::run@116
0x0000000002d0105f: test %eax,-0x1c91065(%rip) # 0x0000000001070000
;*lload
; - TestPerformanceOfStaticVsDynamicCalls::run@92
; {poll}
0x0000000002d01065: cmp $0x3b9aca00,%rbx
0x0000000002d0106c: jl 0x0000000002d01054
Loop with an extra register spill:
0x0000000002d011d0: mov 0x28(%rsp),%r11 <---- the problem is here
0x0000000002d011d5: add %r10,%r11
0x0000000002d011d8: add $0x1,%r10
0x0000000002d011dc: add $0x1,%r11
0x0000000002d011e0: mov %r11,0x28(%rsp) ;*ladd
; - TestPerformanceOfStaticVsDynamicCalls::addStatic@2
; - TestPerformanceOfStaticVsDynamicCalls::run@33
0x0000000002d011e5: mov 0x28(%rsp),%r11 <---- the problem is here
0x0000000002d011ea: add $0x1,%r11 ; OopMap{[32]=Oop off=526}
;*goto
; - TestPerformanceOfStaticVsDynamicCalls::run@44
0x0000000002d011ee: test %eax,-0x1c911f4(%rip) # 0x0000000001070000
;*goto
; - TestPerformanceOfStaticVsDynamicCalls::run@44
; {poll}
0x0000000002d011f4: cmp $0x3b9aca00,%r10
0x0000000002d011fb: jl 0x0000000002d011d0 ;*ifge
; - TestPerformanceOfStaticVsDynamicCalls::run@25
P.S. The following JVM options are useful to analyze the JIT compilation:
-XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining -XX:+PrintAssembly -XX:CompileOnly=TestPerformanceOfStaticVsDynamicCalls