Java loop gets slower after some runs / JIT's fault?

前端 未结 3 585
执笔经年
执笔经年 2020-12-05 15:44

So I wanted to benchmark some basic java functionality to add some imformation to this question: What is the gain from declaring a method as static.

I know writing b

相关标签:
3条回答
  • 2020-12-05 16:35

    It looks like is the way Java is adding values to variable r. I've made a few changes, adding method run2():

    public class TestPerformanceOfStaticVsDynamicCalls {
        private static final long RUNS = 1_000_000_000L;
        public static void main(String[] args) {
            System.out.println("Test run 1 =================================");
            new TestPerformanceOfStaticVsDynamicCalls().run();
            System.out.println("Test run 2 =================================");
            new TestPerformanceOfStaticVsDynamicCalls().run2();
        }
        private void run2() {
            long r = 0;
            long start, end;
            for (int loop = 0; loop < 10; loop++) {
                // Benchmark
                long stat = 0;
                start = System.currentTimeMillis();
                for (long i = 0; i < RUNS; i++) {
                    stat += addStatic(1, i);
                }
                end = System.currentTimeMillis();
                System.out.println("Static: " + (end - start) + " ms");
                long dyna = 0;
                start = System.currentTimeMillis();
                for (long i = 0; i < RUNS; i++) {
                    dyna += addDynamic(1, i);
                }
                end = System.currentTimeMillis();
                System.out.println("Dynamic: " + (end - start) + " ms");
                // If you really want to have values in "r" then...
                r += stat + dyna;
                // Do something with r to keep compiler happy
                System.out.println(r);
            }
        }
        private void run() {
            long r = 0;
            long start, end;
            for (int loop = 0; loop < 10; loop++) {
                // Benchmark
                start = System.currentTimeMillis();
                for (long i = 0; i < RUNS; i++) {
                    r += addStatic(1, i);
                }
                end = System.currentTimeMillis();
                System.out.println("Static: " + (end - start) + " ms");
                start = System.currentTimeMillis();
                for (long i = 0; i < RUNS; i++) {
                    r += addDynamic(1, i);
                }
                end = System.currentTimeMillis();
                System.out.println("Dynamic: " + (end - start) + " ms");
                // If you really want to have values in "r" then...
                // Do something with r to keep compiler happy
                System.out.println(r);
            }
        }
        private long addDynamic(long a, long b) {
            return a + b;
        }
        private static long addStatic(long a, long b) {
            return a + b;
        }
    }
    

    The results for are:

    Test run 1 =================================
    Static: 582 ms
    Dynamic: 579 ms
    1000000001000000000
    Static: 2065 ms
    Dynamic: 2352 ms
    2000000002000000000
    Static: 2084 ms
    Dynamic: 2345 ms
    3000000003000000000
    Static: 2095 ms
    Dynamic: 2347 ms
    4000000004000000000
    Static: 2102 ms
    Dynamic: 2338 ms
    5000000005000000000
    Static: 2073 ms
    Dynamic: 2345 ms
    6000000006000000000
    Static: 2074 ms
    Dynamic: 2341 ms
    7000000007000000000
    Static: 2102 ms
    Dynamic: 2355 ms
    8000000008000000000
    Static: 2062 ms
    Dynamic: 2354 ms
    9000000009000000000
    Static: 2057 ms
    Dynamic: 2350 ms
    -8446744063709551616
    Test run 2 =================================
    Static: 584 ms
    Dynamic: 582 ms
    1000000001000000000
    Static: 587 ms
    Dynamic: 577 ms
    2000000002000000000
    Static: 577 ms
    Dynamic: 579 ms
    3000000003000000000
    Static: 577 ms
    Dynamic: 577 ms
    4000000004000000000
    Static: 578 ms
    Dynamic: 579 ms
    5000000005000000000
    Static: 578 ms
    Dynamic: 580 ms
    6000000006000000000
    Static: 577 ms
    Dynamic: 579 ms
    7000000007000000000
    Static: 578 ms
    Dynamic: 577 ms
    8000000008000000000
    Static: 580 ms
    Dynamic: 578 ms
    9000000009000000000
    Static: 576 ms
    Dynamic: 579 ms
    -8446744063709551616
    

    As for why adding directly to r, I have no clue. Maybe somebody can provide more insights on why accessing r inside the loop block makes things much slower.

    0 讨论(0)
  • 2020-12-05 16:41

    Just one additional Note. I can only observe this strange behavior if I use long for r and is. If I convert them to int then I get these timings:

    Static: 352 ms
    Dynamic: 353 ms
    Static: 348 ms
    Dynamic: 349 ms
    Static: 349 ms
    Dynamic: 348 ms
    Static: 349 ms
    Dynamic: 344 ms
    

    So one possible conclusion is to avoid long in those situations. At least with Linux/Amd64 Java7 where perfomance matters.

    0 讨论(0)
  • 2020-12-05 16:48

    Preamble: Writing microbenchmarks manually is almost always doomed to a failure.
    There are frameworks that have already solved the common benchmarking problems.

    1. JIT compilation unit is a method. Incorporating several benchmarks into a single method leads to unpredictable results.

    2. JIT heavily relies on the execution profile, i.e. the run-time statistics. If a method runs the first scenario for a long time, JIT will optimize the generated code for it. When the method suddenly switches to another scenario, do not expect it to run at the same speed.

    3. JIT may skip optimizing the code that is not executed. It will leave an uncommon trap for this code. If the trap is ever hit, JVM will deoptimize the compiled method, switch to interpreter and after that recompile the code with the new knowledge. E.g. when your method run is compiled for the first time inside the first hot loop, JIT does not know about System.out.println yet. As soon as the execution reaches println, the earlier compiled code is likely to get deoptimized.

    4. The bigger is method - the harder is to optimize it for JIT compiler. E.g. it may appear that there is not enough spare registers to hold all local variables. That's what happen in your case.

    To sum it up, your benchmark seem to pass through the following scenario:

    1. The first hot loop (addStatic) triggers the compilation of run method. The execution profile does not know anything except addStatic method.
    2. System.out.println triggers the deoptimization and after that the second hot loop (addDynamic) leads to recompilation of run method.
    3. Now the execution profile contains information only about addDynamic, so JIT optimizes the second loop, and the first loop appears to have extra register spills:

    Optimized loop:

    0x0000000002d01054: add    %rbx,%r14
    0x0000000002d01057: add    $0x1,%rbx          ;*ladd
                                                  ; - TestPerformanceOfStaticVsDynamicCalls::addDynamic@2
                                                  ; - TestPerformanceOfStaticVsDynamicCalls::run@105
    
    0x0000000002d0105b: add    $0x1,%r14          ; OopMap{rbp=Oop off=127}
                                                  ;*goto
                                                  ; - TestPerformanceOfStaticVsDynamicCalls::run@116
    
    0x0000000002d0105f: test   %eax,-0x1c91065(%rip)        # 0x0000000001070000
                                                  ;*lload
                                                  ; - TestPerformanceOfStaticVsDynamicCalls::run@92
                                                  ;   {poll}
    0x0000000002d01065: cmp    $0x3b9aca00,%rbx
    0x0000000002d0106c: jl     0x0000000002d01054
    

    Loop with an extra register spill:

    0x0000000002d011d0: mov    0x28(%rsp),%r11  <---- the problem is here
    0x0000000002d011d5: add    %r10,%r11
    0x0000000002d011d8: add    $0x1,%r10
    0x0000000002d011dc: add    $0x1,%r11
    0x0000000002d011e0: mov    %r11,0x28(%rsp)    ;*ladd
                                                  ; - TestPerformanceOfStaticVsDynamicCalls::addStatic@2
                                                  ; - TestPerformanceOfStaticVsDynamicCalls::run@33
    
    0x0000000002d011e5: mov    0x28(%rsp),%r11  <---- the problem is here
    0x0000000002d011ea: add    $0x1,%r11          ; OopMap{[32]=Oop off=526}
                                                  ;*goto
                                                  ; - TestPerformanceOfStaticVsDynamicCalls::run@44
    
    0x0000000002d011ee: test   %eax,-0x1c911f4(%rip)        # 0x0000000001070000
                                                  ;*goto
                                                  ; - TestPerformanceOfStaticVsDynamicCalls::run@44
                                                  ;   {poll}
    0x0000000002d011f4: cmp    $0x3b9aca00,%r10
    0x0000000002d011fb: jl     0x0000000002d011d0  ;*ifge
                                                  ; - TestPerformanceOfStaticVsDynamicCalls::run@25
    

    P.S. The following JVM options are useful to analyze the JIT compilation:

    -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining -XX:+PrintAssembly -XX:CompileOnly=TestPerformanceOfStaticVsDynamicCalls
    
    0 讨论(0)
提交回复
热议问题