microbenchmark | 易学教程

Why might this SIMD array-adding sample not be demonstrating any performance gains over a naive implementation?

阅读更多关于 Why might this SIMD array-adding sample not be demonstrating any performance gains over a naive implementation?

问题 class Program { static void Main(string[] args) { Console.WriteLine(Vector.IsHardwareAccelerated ? "SIMD supported" : "SIMD not supported."); var rand = new Random(); var numNums = 10000000; var arr1 = Enumerable.Repeat(0, numNums).Select(x => (int) (rand.NextDouble() * 100)).ToArray(); var arr2 = Enumerable.Repeat(0, numNums).Select(x => (int) (rand.NextDouble() * 100)).ToArray(); var simdResult = new int [numNums]; var conventionalResult = new int [numNums]; var watch = System.Diagnostics

Why is String concatenation faster than String.valueOf for converting an Integer to a String?

阅读更多关于 Why is String concatenation faster than String.valueOf for converting an Integer to a String?

问题 I have a benchmark : @BenchmarkMode(Mode.Throughput) @Fork(1) @State(Scope.Thread) @Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS, batchSize = 1000) @Measurement(iterations = 40, time = 1, timeUnit = TimeUnit.SECONDS, batchSize = 1000) public class StringConcatTest { private int aInt; @Setup public void prepare() { aInt = 100; } @Benchmark public String emptyStringInt() { return "" + aInt; } @Benchmark public String valueOfInt() { return String.valueOf(aInt); } } And here is

Measure precision of timer (e.g. Stopwatch/QueryPerformanceCounter)

阅读更多关于 Measure precision of timer (e.g. Stopwatch/QueryPerformanceCounter)

问题 Given that the Stopwatch class in C# can use something like three different timers underneath e.g. System timer e.g. precision of approx +-10 ms depending on timer resolution that can be set with timeBeginPeriod it can be approx +-1 ms . Time Stamp Counter (TSC) e.g. with a tick frequency of 2.5MHz or 1 tick = 400 ns so ideally a precision of that. High Precision Event Timer (HPET) e.g. with a tick frequency of 25MHz or 1 tick = 40 ns so ideally a precision of that. how can we measure the

System.arraycopy with constant length

阅读更多关于 System.arraycopy with constant length

问题 I'm playing around with JMH ( http://openjdk.java.net/projects/code-tools/jmh/ ) and I just stumbled on a strange result. I'm benchmarking ways to make a shallow copy of an array and I can observe the expected results (that looping through the array is a bad idea and that there is no significant difference between #clone() , System#arraycopy() and Arrays#copyOf() , performance-wise). Except that System#arraycopy() is one-quarter slower when the array's length is hard-coded... Wait, what ? How

Why is Arrays.copyOf 2 times faster than System.arraycopy for small arrays?

阅读更多关于 Why is Arrays.copyOf 2 times faster than System.arraycopy for small arrays?

问题 I was recently playing with some benchmarks and found very interesting results that I can't explain right now. Here is the benchmark: @BenchmarkMode(Mode.Throughput) @Fork(1) @State(Scope.Thread) @Warmup(iterations = 10, time = 1, batchSize = 1000) @Measurement(iterations = 10, time = 1, batchSize = 1000) public class ArrayCopy { @Param({"1","5","10","100", "1000"}) private int size; private int[] ar; @Setup public void setup() { ar = new int[size]; for (int i = 0; i < size; i++) { ar[i] = i;

Why is Arrays.copyOf 2 times faster than System.arraycopy for small arrays?

阅读更多关于 Why is Arrays.copyOf 2 times faster than System.arraycopy for small arrays?

What does autoplot.microbenchmark actually plot?

阅读更多关于 What does autoplot.microbenchmark actually plot?

问题 According to the docs, microbenchmark:::autoplot "Uses ggplot2 to produce a more legible graph of microbenchmark timings." Cool! Let's try the example code: library("ggplot2") tm <- microbenchmark(rchisq(100, 0), rchisq(100, 1), rchisq(100, 2), rchisq(100, 3), rchisq(100, 5), times=1000L) autoplot(tm) I don't see anything about the...squishy undulations in the documentation, but my best guess from this answer by the function creator is that this is like a smoothed series of boxplots of the

Understanding the output of -XX:+PrintCompilation

阅读更多关于 Understanding the output of -XX:+PrintCompilation

问题 I am running some micro benchmarks on Java list iteration code. I have used -XX:+PrintCompilation, and -verbose:gc flags to ensure that nothing is happening in the background when the timing is being run. However, I see something in the output which I cannot understand. Here's the code, I am running the benchmark on: import java.util.ArrayList; import java.util.List; public class PerformantIteration { private static int theSum = 0; public static void main(String[] args) { System.out.println(

Check of microbenchmark results fails with data.table changed by reference

阅读更多关于 Check of microbenchmark results fails with data.table changed by reference

问题 There are some answers on SO where timings are compared without checking the results. However, I prefer to see whether an expression is correct and fast. The microbenchmark package supports this with the check parameter. Unfortunately, the check fails on expressions which change a data.table by reference , i.e., the check does not recognize that results are different. Case 1: data.table expressions where check works as expected library(data.table) library(microbenchmark) # minimal data.table

perf enable demangling of callgraph

阅读更多关于 perf enable demangling of callgraph

问题 How do I enable C++ demangling for the perf callgraph? It seems to demangle symbols when I go into annotate mode, but not in the main callgraph. Sample code (using Google Benchmark): #include <benchmark/benchmark.h> #include <vector> static __attribute__ ((noinline)) int my_really_big_function() { for(size_t i = 0; i < 1000; ++i) { benchmark::DoNotOptimize(i % 5); } return 0; } static __attribute__ ((noinline)) void caller1() { for(size_t i = 0; i < 1000; ++i) { benchmark::DoNotOptimize(my