Why is (a*b != 0) faster than (a != 0 && b != 0) in Java?

后端 未结 5 1724
渐次进展
渐次进展 2021-01-29 17:34

I\'m writing some code in Java where, at some point, the flow of the program is determined by whether two int variables, "a" and "b", are non-zero (note: a a

5条回答
  •  野性不改
    2021-01-29 17:55

    I think your benchmark has some flaws and might not be useful for inferring about real programs. Here are my thoughts:

    • (a|b)!=0 and (a+b)!=0 test if either value is non-zero, whereas a != 0 && b != 0 and (a*b)!=0 test if both are non-zero. So you are not comparing the timing of just the arithmetic: if the condition is true more often, it causes more executions of the if body, which takes more time too.

    • (a+b)!=0 will do the wrong thing for positive and negative values that sum to zero, so you can't use it in the general case, even if it works here.

    • Similarly, (a*b)!=0 will do the wrong thing for values that overflow. (Random example: 196608 * 327680 is 0 because the true result happens to be divisible by 232, so its low 32 bits are 0, and those bits are all you get if it's an int operation.)

    • The VM will optimize the expression during the first few runs of the outer (fraction) loop, when fraction is 0, when the branches are almost never taken. The optimizer may do different things if you start fraction at 0.5.

    • Unless the VM is able to eliminate some of the array bounds checks here, there are four other branches in the expression just due to the bounds checks, and that's a complicating factor when trying to figure out what's happening at a low level. You might get different results if you split the two-dimensional array into two flat arrays, changing nums[0][i] and nums[1][i] to nums0[i] and nums1[i].

    • CPU branch predictors detect short patterns in the data, or runs of all branches being taken or not taken. Your randomly generated benchmark data is the worst-case scenario for a branch predictor. If real-world data has a predictable pattern, or it has long runs of all-zero and all-non-zero values, the branches could cost much less.

    • The particular code that is executed after the condition is met can affect the performance of evaluating the condition itself, because it affects things like whether or not the loop can be unrolled, which CPU registers are available, and if any of the fetched nums values need to be reused after evaluating the condition. Merely incrementing a counter in the benchmark is not a perfect placeholder for what real code would do.

    • System.currentTimeMillis() is on most systems not more accurate than +/- 10 ms. System.nanoTime() is usually more accurate.

    There are lots of uncertainties, and it's always hard to say anything definite with these sorts of micro-optimizations because a trick that is faster on one VM or CPU can be slower on another. If running the 32-bit HotSpot JVM, rather than the 64-bit version, be aware that it comes in two flavors: with the "Client" VM having different (weaker) optimizations compared to the "Server" VM.

    If you can disassemble the machine code generated by the VM, do that rather than trying to guess what it does!

提交回复
热议问题