I have a piece of code where it appears, in every test I\'ve run, that function calls have a significant amount of overhead. The code is a tight loop, performing a very simp
A method call is not a problem since hot methods are often inlined. A virtual call is an issue.
In your code the type profiler is fooled by the initialization method Image.random. When Image.process is JIT-compiled for the first time, it is optimized for calling random.nextInt(). So the next invocations of Image.process will result in the inline-cache miss followed by a non-optimized virtual call to Shader.apply.
Remove an Image.process call from the initialization method and JIT will then inline the useful calls to Shader.apply.
After BlurShader.apply is inlined you can help JIT to perform Common subexpression elimination optimization by replacing
final int p = s * y + x;
with
final int p = y * s + x;
The latter expression is also met in Image.process, so JIT will not calculate the same expression twice.
After applying these two changes I've achieved the ideal benchmark score:
Benchmark Mode Samples Mean Mean error Units
s.ShaderBench.testProcessInline thrpt 5 36,483 1,255 ops/s
s.ShaderBench.testProcessLambda thrpt 5 36,323 0,936 ops/s
s.ShaderBench.testProcessProc thrpt 5 36,163 1,421 ops/s