microbenchmark

Why jnz requires 2 cycles to complete in an inner loop

佐手、 提交于 2020-01-20 08:07:45
问题 I'm on an IvyBridge. I found the performance behavior of jnz inconsistent in inner loop and outer loop. The following simple program has an inner loop with fixed size 16: global _start _start: mov rcx, 100000000 .loop_outer: mov rax, 16 .loop_inner: dec rax jnz .loop_inner dec rcx jnz .loop_outer xor edi, edi mov eax, 60 syscall perf tool shows the outer loop runs 32c/iter. It suggests the jnz requires 2 cycles to complete. I then search in Agner's instruction table, conditional jump has 1-2

How to use Caliper benchmark beta snapshot without maven?

谁说我不能喝 提交于 2020-01-15 06:17:30
问题 I have been asked to use Google's Caliper project to create a few microbenchmarks. I would very much like to use the annotation features of the newest beta snapshot, but aside from a few small examples I am having trouble finding good documentation on how to actually run the thing... There is a video tutorial up which instructs users on the new maven integration feature, which I was also asked NOT to use. Right now I just have a small example stripped from one of theirs, modified with some

Long latency instruction

我怕爱的太早我们不能终老 提交于 2020-01-14 19:39:12
问题 I would like a long-latency single-uop x86 1 instruction, in order to create long dependency chains as part of testing microarchitectural features. Currently I'm using fsqrt , but I'm wondering is there is something better. Ideally, the instruction will score well on the following criteria: Long latency Stable/fixed latency One or a few uops (especially: not microcoded) Consumes as few uarch resources as possible (load/store buffers, page walkers, etc) Able to chain (latency-wise) with itself

Fastest Linux system call

偶尔善良 提交于 2020-01-10 02:53:08
问题 On an x86-64 Intel system that supports syscall and sysret what's the "fastest" system call from 64-bit user code on a vanilla kernel? In particular, it must be a system call that exercises the syscall / sysret user <-> kernel transition 1 , but does the least amount of work beyond that. It doesn't even need to do the syscall itself: some type of early error which never dispatches to the specific call on the kernel side is fine, as long as it doesn't go down some slow path because of that.

Fastest Linux system call

旧巷老猫 提交于 2020-01-10 02:53:05
问题 On an x86-64 Intel system that supports syscall and sysret what's the "fastest" system call from 64-bit user code on a vanilla kernel? In particular, it must be a system call that exercises the syscall / sysret user <-> kernel transition 1 , but does the least amount of work beyond that. It doesn't even need to do the syscall itself: some type of early error which never dispatches to the specific call on the kernel side is fine, as long as it doesn't go down some slow path because of that.

JMH: don't take into account inner method time

走远了吗. 提交于 2020-01-04 23:25:17
问题 I have: Methods like this: @GenerateMicroBenchmark public static void calculateArraySummary(String[] args) { // create a random data set /* PROBLEM HERE: * now I measure not only pool.invoke(finder) time, * but also generateRandomArray method time */ final int[] array = generateRandomArray(1000000); // submit the task to the pool final ForkJoinPool pool = new ForkJoinPool(4); final ArraySummator finder = new ArraySummator(array); System.out.println(pool.invoke(finder)); } private static int[]

Capturing (externally) the memory consumption of a given Callback

萝らか妹 提交于 2019-12-29 09:09:05
问题 The Problem Lets say I have this function: function hog($i = 1) // uses $i * 0.5 MiB, returns $i * 0.25 MiB { $s = str_repeat('a', $i * 1024 * 512); return substr($s, $i * 1024 * 256); } I would like to call it and be able to inspect the maximum amount of memory it uses. In other words: memory_get_function_peak_usage($callback); . Is this possible? What I Have Tried I'm using the following values as my non-monotonically increasing $i argument for hog() : $iterations = array_merge(range(0, 50,

How to minimize the costs for allocating and initializing an NSDateFormatter?

纵然是瞬间 提交于 2019-12-29 07:37:06
问题 I noticed that using an NSDateFormatter can be quite costly. I figured out that allocating and initializing the object already consumes a lot of time. Further, it seems that using an NSDateFormatter in multiple threads increases the costs. Can there be a blocking where the threads have to wait for each other? I created a small test application to illustrate the problem. Please check it out. http://github.com/johnjohndoe/TestNSDateFormatter git://github.com/johnjohndoe/TestNSDateFormatter.git

Calculate time encryption of AES/CCM in Visual Studio 2017

孤人 提交于 2019-12-29 02:14:08
问题 I am using the library Crypto++ 5.6.5 and Visual Studio 2017. How can I calculate the encryption time for AES-CCM? 回答1: I would like to know how to calculate the encryption time for AES-CCM. The Crypto++ wiki provides an article Benchmarks. It provides a lot of details regarding library performance, how throughput is calculated, and it even references the source code where the actual throughput is measured. Believe it or not, a simple call to clock works just fine to measure bulk encryption.

Why do two consecutive calls to the same method yield different times for execution?

假如想象 提交于 2019-12-24 00:59:27
问题 Here is a sample code: public class TestIO{ public static void main(String[] str){ TestIO t = new TestIO(); t.fOne(); t.fTwo(); t.fOne(); t.fTwo(); } public void fOne(){ long t1, t2; t1 = System.nanoTime(); int i = 10; int j = 10; int k = j*i; System.out.println(k); t2 = System.nanoTime(); System.out.println("Time taken by 'fOne' ... " + (t2-t1)); } public void fTwo(){ long t1, t2; t1 = System.nanoTime(); int i = 10; int j = 10; int k = j*i; System.out.println(k); t2 = System.nanoTime();