microbenchmark

Why might this SIMD array-adding sample not be demonstrating any performance gains over a naive implementation?

时光总嘲笑我的痴心妄想 提交于 2019-12-23 20:09:26
问题 class Program { static void Main(string[] args) { Console.WriteLine(Vector.IsHardwareAccelerated ? "SIMD supported" : "SIMD not supported."); var rand = new Random(); var numNums = 10000000; var arr1 = Enumerable.Repeat(0, numNums).Select(x => (int) (rand.NextDouble() * 100)).ToArray(); var arr2 = Enumerable.Repeat(0, numNums).Select(x => (int) (rand.NextDouble() * 100)).ToArray(); var simdResult = new int [numNums]; var conventionalResult = new int [numNums]; var watch = System.Diagnostics

Why is String concatenation faster than String.valueOf for converting an Integer to a String?

爷,独闯天下 提交于 2019-12-23 06:49:19
问题 I have a benchmark : @BenchmarkMode(Mode.Throughput) @Fork(1) @State(Scope.Thread) @Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS, batchSize = 1000) @Measurement(iterations = 40, time = 1, timeUnit = TimeUnit.SECONDS, batchSize = 1000) public class StringConcatTest { private int aInt; @Setup public void prepare() { aInt = 100; } @Benchmark public String emptyStringInt() { return "" + aInt; } @Benchmark public String valueOfInt() { return String.valueOf(aInt); } } And here is

Measure precision of timer (e.g. Stopwatch/QueryPerformanceCounter)

回眸只為那壹抹淺笑 提交于 2019-12-22 08:03:39
问题 Given that the Stopwatch class in C# can use something like three different timers underneath e.g. System timer e.g. precision of approx +-10 ms depending on timer resolution that can be set with timeBeginPeriod it can be approx +-1 ms . Time Stamp Counter (TSC) e.g. with a tick frequency of 2.5MHz or 1 tick = 400 ns so ideally a precision of that. High Precision Event Timer (HPET) e.g. with a tick frequency of 25MHz or 1 tick = 40 ns so ideally a precision of that. how can we measure the

System.arraycopy with constant length

别说谁变了你拦得住时间么 提交于 2019-12-22 01:53:58
问题 I'm playing around with JMH ( http://openjdk.java.net/projects/code-tools/jmh/ ) and I just stumbled on a strange result. I'm benchmarking ways to make a shallow copy of an array and I can observe the expected results (that looping through the array is a bad idea and that there is no significant difference between #clone() , System#arraycopy() and Arrays#copyOf() , performance-wise). Except that System#arraycopy() is one-quarter slower when the array's length is hard-coded... Wait, what ? How

Why is Arrays.copyOf 2 times faster than System.arraycopy for small arrays?

萝らか妹 提交于 2019-12-21 10:49:54
问题 I was recently playing with some benchmarks and found very interesting results that I can't explain right now. Here is the benchmark: @BenchmarkMode(Mode.Throughput) @Fork(1) @State(Scope.Thread) @Warmup(iterations = 10, time = 1, batchSize = 1000) @Measurement(iterations = 10, time = 1, batchSize = 1000) public class ArrayCopy { @Param({"1","5","10","100", "1000"}) private int size; private int[] ar; @Setup public void setup() { ar = new int[size]; for (int i = 0; i < size; i++) { ar[i] = i;

Why is Arrays.copyOf 2 times faster than System.arraycopy for small arrays?

纵饮孤独 提交于 2019-12-21 10:47:31
问题 I was recently playing with some benchmarks and found very interesting results that I can't explain right now. Here is the benchmark: @BenchmarkMode(Mode.Throughput) @Fork(1) @State(Scope.Thread) @Warmup(iterations = 10, time = 1, batchSize = 1000) @Measurement(iterations = 10, time = 1, batchSize = 1000) public class ArrayCopy { @Param({"1","5","10","100", "1000"}) private int size; private int[] ar; @Setup public void setup() { ar = new int[size]; for (int i = 0; i < size; i++) { ar[i] = i;

What does autoplot.microbenchmark actually plot?

﹥>﹥吖頭↗ 提交于 2019-12-21 07:01:22
问题 According to the docs, microbenchmark:::autoplot "Uses ggplot2 to produce a more legible graph of microbenchmark timings." Cool! Let's try the example code: library("ggplot2") tm <- microbenchmark(rchisq(100, 0), rchisq(100, 1), rchisq(100, 2), rchisq(100, 3), rchisq(100, 5), times=1000L) autoplot(tm) I don't see anything about the...squishy undulations in the documentation, but my best guess from this answer by the function creator is that this is like a smoothed series of boxplots of the

Understanding the output of -XX:+PrintCompilation

老子叫甜甜 提交于 2019-12-20 09:00:04
问题 I am running some micro benchmarks on Java list iteration code. I have used -XX:+PrintCompilation, and -verbose:gc flags to ensure that nothing is happening in the background when the timing is being run. However, I see something in the output which I cannot understand. Here's the code, I am running the benchmark on: import java.util.ArrayList; import java.util.List; public class PerformantIteration { private static int theSum = 0; public static void main(String[] args) { System.out.println(

Check of microbenchmark results fails with data.table changed by reference

给你一囗甜甜゛ 提交于 2019-12-20 02:28:16
问题 There are some answers on SO where timings are compared without checking the results. However, I prefer to see whether an expression is correct and fast. The microbenchmark package supports this with the check parameter. Unfortunately, the check fails on expressions which change a data.table by reference , i.e., the check does not recognize that results are different. Case 1: data.table expressions where check works as expected library(data.table) library(microbenchmark) # minimal data.table

perf enable demangling of callgraph

拈花ヽ惹草 提交于 2019-12-18 11:25:47
问题 How do I enable C++ demangling for the perf callgraph? It seems to demangle symbols when I go into annotate mode, but not in the main callgraph. Sample code (using Google Benchmark): #include <benchmark/benchmark.h> #include <vector> static __attribute__ ((noinline)) int my_really_big_function() { for(size_t i = 0; i < 1000; ++i) { benchmark::DoNotOptimize(i % 5); } return 0; } static __attribute__ ((noinline)) void caller1() { for(size_t i = 0; i < 1000; ++i) { benchmark::DoNotOptimize(my