microbenchmark

Capturing (externally) the memory consumption of a given Callback

戏子无情 提交于 2019-11-29 15:54:31
The Problem Lets say I have this function: function hog($i = 1) // uses $i * 0.5 MiB, returns $i * 0.25 MiB { $s = str_repeat('a', $i * 1024 * 512); return substr($s, $i * 1024 * 256); } I would like to call it and be able to inspect the maximum amount of memory it uses. In other words: memory_get_function_peak_usage($callback); . Is this possible? What I Have Tried I'm using the following values as my non-monotonically increasing $i argument for hog() : $iterations = array_merge(range(0, 50, 10), range(50, 0, 5)); $iterations = array_fill_keys($iterations, 0); Which is essentially: ( [0] => 0

Difference between MATLAB's numel and length functions

巧了我就是萌 提交于 2019-11-29 15:53:18
问题 I know that length(x) returns max(size(x)) and numel(x) returns the total number of elements of x, but which is better for a 1 by n array? Does it matter, or are they interchangeable in this case? EDIT: Just for kicks: Looks like they're the same performance-wise until you get to 100k elements. 回答1: In that case they return the same and there's no difference. In term of performance, it depends on the inner working of arrays in MATLAB. E.g. if there are metainformations about how many elements

Why jnz requires 2 cycles to complete in an inner loop

末鹿安然 提交于 2019-11-29 14:22:05
I'm on an IvyBridge. I found the performance behavior of jnz inconsistent in inner loop and outer loop. The following simple program has an inner loop with fixed size 16: global _start _start: mov rcx, 100000000 .loop_outer: mov rax, 16 .loop_inner: dec rax jnz .loop_inner dec rcx jnz .loop_outer xor edi, edi mov eax, 60 syscall perf tool shows the outer loop runs 32c/iter. It suggests the jnz requires 2 cycles to complete. I then search in Agner's instruction table, conditional jump has 1-2 "reciprocal throughput", with a comment "fast if no jump". At this point I start to believe the above

First time a Java loop is run SLOW, why? [Sun HotSpot 1.5, sparc]

随声附和 提交于 2019-11-29 13:18:11
In benchmarking some Java code on a Solaris SPARC box, I noticed that the first time I call the benchmarked function it runs EXTREMELY slowly (10x difference): First | 1 | 25295.979 ms Second | 1 | 2256.990 ms Third | 1 | 2250.575 ms Why is this? I suspect the JIT compiler, is there any way to verify this? Edit: In light of some answers I wanted to clarify that this code is the simplest possible test-case I could find exhibiting this behavior. So my goal isn't to get it to run fast, but to understand what's going on so I can avoid it in my real benchmarks. Solved: Tom Hawtin correctly pointed

Why lambda IntStream.anyMatch() is 10 slower than naive implementation?

五迷三道 提交于 2019-11-29 08:46:34
I was recently profiling my code and found one interesting bottleneck in it. Here is benchmark : @BenchmarkMode(Mode.Throughput) @Fork(1) @State(Scope.Thread) @Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS) @Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS) public class Contains { private int[] ar = new int[] {1,2,3,4,5,6,7}; private int val = 5; @Benchmark public boolean naive() { return contains(ar, val); } @Benchmark public boolean lambdaArrayStreamContains() { return Arrays.stream(ar).anyMatch(i -> i == val); } @Benchmark public boolean

Java for loop performance question

*爱你&永不变心* 提交于 2019-11-28 23:11:43
considering this example: public static void main(final String[] args) { final List<String> myList = Arrays.asList("A", "B", "C", "D"); final long start = System.currentTimeMillis(); for (int i = 1000000; i > myList.size(); i--) { System.out.println("Hello"); } final long stop = System.currentTimeMillis(); System.out.println("Finish: " + (stop - start)); } vs public static void main(final String[] args) { final List<String> myList = Arrays.asList("A", "B", "C", "D"); final long start = System.currentTimeMillis(); final int size = myList.size(); for (int i = 1000000; i > size; i--) { System.out

Why is the StringBuilder chaining pattern sb.append(x).append(y) faster than regular sb.append(x); sb.append(y)?

帅比萌擦擦* 提交于 2019-11-28 16:30:47
I have a microbenchmark that shows very strange results: @BenchmarkMode(Mode.Throughput) @Fork(1) @State(Scope.Thread) @Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS, batchSize = 1000) @Measurement(iterations = 40, time = 1, timeUnit = TimeUnit.SECONDS, batchSize = 1000) public class Chaining { private String a1 = "111111111111111111111111"; private String a2 = "222222222222222222222222"; private String a3 = "333333333333333333333333"; @Benchmark public String typicalChaining() { return new StringBuilder().append(a1).append(a2).append(a3).toString(); } @Benchmark public String

Calculate time encryption of AES/CCM in Visual Studio 2017

╄→гoц情女王★ 提交于 2019-11-28 14:46:33
I am using the library Crypto++ 5.6.5 and Visual Studio 2017. How can I calculate the encryption time for AES-CCM? I would like to know how to calculate the encryption time for AES-CCM. The Crypto++ wiki provides an article Benchmarks . It provides a lot of details regarding library performance, how throughput is calculated, and it even references the source code where the actual throughput is measured. Believe it or not, a simple call to clock works just fine to measure bulk encryption. Also see Benchmarks | Timing Loop in the same wiki article. To benchmark AES/CCM, do something like the

Hidden performance cost in Scala?

假装没事ソ 提交于 2019-11-28 13:53:53
问题 I came across this old question and did the following experiment with scala 2.10.3. I rewrote the Scala version to use explicit tail recursion: import scala.annotation.tailrec object ScalaMain { private val t = 20 private def run() { var i = 10 while(!isEvenlyDivisible(2, i, t)) i += 2 println(i) } @tailrec private def isEvenlyDivisible(i: Int, a: Int, b: Int): Boolean = { if (i > b) true else (a % i == 0) && isEvenlyDivisible(i+1, a, b) } def main(args: Array[String]) { val t1 = System

Strange JIT pessimization of a loop idiom

你说的曾经没有我的故事 提交于 2019-11-28 09:03:58
While analyzing the results of a recent question here , I encountered a quite peculiar phenomenon: apparently an extra layer of HotSpot's JIT-optimization actually slows down execution on my machine. Here is the code I have used for the measurement: @OutputTimeUnit(TimeUnit.NANOSECONDS) @BenchmarkMode(Mode.AverageTime) @OperationsPerInvocation(Measure.ARRAY_SIZE) @Warmup(iterations = 2, time = 1) @Measurement(iterations = 5, time = 1) @State(Scope.Thread) @Threads(1) @Fork(2) public class Measure { public static final int ARRAY_SIZE = 1024; private final int[] array = new int[ARRAY_SIZE];