microbenchmark

First time a Java loop is run SLOW, why? [Sun HotSpot 1.5, sparc]

喜夏-厌秋 提交于 2019-11-28 02:58:32
问题 In benchmarking some Java code on a Solaris SPARC box, I noticed that the first time I call the benchmarked function it runs EXTREMELY slowly (10x difference): First | 1 | 25295.979 ms Second | 1 | 2256.990 ms Third | 1 | 2250.575 ms Why is this? I suspect the JIT compiler, is there any way to verify this? Edit: In light of some answers I wanted to clarify that this code is the simplest possible test-case I could find exhibiting this behavior. So my goal isn't to get it to run fast, but to

Why lambda IntStream.anyMatch() is 10 slower than naive implementation?

江枫思渺然 提交于 2019-11-28 02:12:44
问题 I was recently profiling my code and found one interesting bottleneck in it. Here is benchmark : @BenchmarkMode(Mode.Throughput) @Fork(1) @State(Scope.Thread) @Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS) @Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS) public class Contains { private int[] ar = new int[] {1,2,3,4,5,6,7}; private int val = 5; @Benchmark public boolean naive() { return contains(ar, val); } @Benchmark public boolean

Why the bounds check doesn't get eliminated?

谁说我不能喝 提交于 2019-11-27 22:00:37
I wrote a simple benchmark in order to find out if bounds check can be eliminated when the array gets computed via bitwise and. This is basically what nearly all hash tables do: They compute h & (table.length - 1) as an index into the table , where h is the hashCode or a derived value. The results shows that the bounds check don't get eliminated. The idea of my benchmark is pretty simple: Compute two values i and j , where both are guaranteed to be valid array indexes. i is the loop counter. When it gets used as array index, the bounds check gets eliminated. j gets computed as x & (table

Java for loop performance question

≡放荡痞女 提交于 2019-11-27 14:43:36
问题 considering this example: public static void main(final String[] args) { final List<String> myList = Arrays.asList("A", "B", "C", "D"); final long start = System.currentTimeMillis(); for (int i = 1000000; i > myList.size(); i--) { System.out.println("Hello"); } final long stop = System.currentTimeMillis(); System.out.println("Finish: " + (stop - start)); } vs public static void main(final String[] args) { final List<String> myList = Arrays.asList("A", "B", "C", "D"); final long start = System

Why is the StringBuilder chaining pattern sb.append(x).append(y) faster than regular sb.append(x); sb.append(y)?

纵然是瞬间 提交于 2019-11-27 09:44:43
问题 I have a microbenchmark that shows very strange results: @BenchmarkMode(Mode.Throughput) @Fork(1) @State(Scope.Thread) @Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS, batchSize = 1000) @Measurement(iterations = 40, time = 1, timeUnit = TimeUnit.SECONDS, batchSize = 1000) public class Chaining { private String a1 = "111111111111111111111111"; private String a2 = "222222222222222222222222"; private String a3 = "333333333333333333333333"; @Benchmark public String typicalChaining(

Strange JIT pessimization of a loop idiom

给你一囗甜甜゛ 提交于 2019-11-27 02:37:12
问题 While analyzing the results of a recent question here, I encountered a quite peculiar phenomenon: apparently an extra layer of HotSpot's JIT-optimization actually slows down execution on my machine. Here is the code I have used for the measurement: @OutputTimeUnit(TimeUnit.NANOSECONDS) @BenchmarkMode(Mode.AverageTime) @OperationsPerInvocation(Measure.ARRAY_SIZE) @Warmup(iterations = 2, time = 1) @Measurement(iterations = 5, time = 1) @State(Scope.Thread) @Threads(1) @Fork(2) public class

Weird performance effects from nearby dependent stores in a pointer-chasing loop on IvyBridge. Adding an extra load speeds it up?

天大地大妈咪最大 提交于 2019-11-26 14:55:49
问题 First I have the below setup on an IvyBridge, I will insert measuring payload code in the commented location. The first 8 bytes of buf store the address of buf itself, I use this to create loop-carried dependency: section .bss align 64 buf: resb 64 section .text global _start _start: mov rcx, 1000000000 mov qword [buf], buf mov rax, buf loop: ; I will insert payload here ; as is described below dec rcx jne loop xor rdi, rdi mov rax, 60 syscall case 1: I insert into the payload location: mov

How do I write a correct micro-benchmark in Java?

天大地大妈咪最大 提交于 2019-11-25 22:09:21
问题 How do you write (and run) a correct micro-benchmark in Java? I\'m looking for some code samples and comments illustrating various things to think about. Example: Should the benchmark measure time/iteration or iterations/time, and why? Related: Is stopwatch benchmarking acceptable? 回答1: Tips about writing micro benchmarks from the creators of Java HotSpot: Rule 0: Read a reputable paper on JVMs and micro-benchmarking. A good one is Brian Goetz, 2005. Do not expect too much from micro