microbenchmark

Cpp uint32_fast_t resolves to uint64_t but is slower for nearly all operations than a uint32_t (x86_64). Why does it resolve to uint64_t?

久未见 提交于 2021-02-16 13:58:29
问题 Ran a benchmark and uint32_fast_t is 8 byte but slower than 4 byte uint32_t for nearly all operations. If this is the case why does uint32_fast_t not stay as 4 bytes? OS info: 5.3.0-62-generic #56~18.04.1-Ubuntu SMP Wed Jun 24 16:17:03 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux Cpu info: cat /sys/devices/cpu/caps/pmu_name skylake model name : Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz Benchmark I used for testing: #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <cstdint>

Is the difference between these two evals explained with constant folding?

被刻印的时光 ゝ 提交于 2021-02-09 11:10:41
问题 Given these two evals which only change Module::FOO() and FOO() . # Symbols imported, and used locally. eval qq[ package Foo$num; Module->import(); my \$result = Module::FOO() * Module::FOO(); ] or die $@; # Symbols imported, not used locally referencing parent symbol. eval qq[ package Foo$num; Module->import(); my \$result = FOO() * FOO(); ] or die $@; why would the top block take up substantially less space? The script and output are reproduced below, Script package Module { use v5.30; use

Is the difference between these two evals explained with constant folding?

家住魔仙堡 提交于 2021-02-09 11:09:33
问题 Given these two evals which only change Module::FOO() and FOO() . # Symbols imported, and used locally. eval qq[ package Foo$num; Module->import(); my \$result = Module::FOO() * Module::FOO(); ] or die $@; # Symbols imported, not used locally referencing parent symbol. eval qq[ package Foo$num; Module->import(); my \$result = FOO() * FOO(); ] or die $@; why would the top block take up substantially less space? The script and output are reproduced below, Script package Module { use v5.30; use

Strange behavior in sun.misc.Unsafe.compareAndSwap measurement via JMH

徘徊边缘 提交于 2021-02-07 07:37:34
问题 I've decided to measure incrementation with different locking strategies and using JMH for this purpose. I'm using JMH for checking throughput and average time as well as simple custom test for checking correctness. There are six strategies: Atomic count ReadWrite locking count Synchronizing with volatile Synchronizing block without volatile sun.misc.Unsafe.compareAndSwap sun.misc.Unsafe.getAndAdd Unsynchronizing count Benchmark code: @State(Scope.Benchmark) @BenchmarkMode({Mode.Throughput,

How should I approach to find number of pipeline stages in my Laptop's CPU

浪尽此生 提交于 2020-12-23 08:20:25
问题 I want to look into how latest processors differs from standard RISC V implementation (RISC V having 5 stage pipeline - fetch, decode, memory , ALU , Write back) but not able to find how should I start approaching the problem so as to find the current implementation of pipelining at processor I tried referring Intel documentation for i7-4510U documentation but it was not much help 回答1: Haswell's pipeline length is reportedly 14 stages (on a uop-cache hit), 19 stages when fetching from L1i for

How to run methods in benchmarks sequentially with JMH?

风流意气都作罢 提交于 2020-12-13 05:11:03
问题 In my scenario, the methods in benchmark should run sequentially in one thread and modify the state in order. For example, there is a List<Integer> called num in the benchmark class. What I want is: first, run add() to append a number into the list. Then, run remove() to remove the number from the list. The calling sequence must be add() --> remove() . If remove() runs before add() or they run concurrently, they would raise exceptions because there's no element in the list. That is, add() and

How to run methods in benchmarks sequentially with JMH?

断了今生、忘了曾经 提交于 2020-12-13 05:10:11
问题 In my scenario, the methods in benchmark should run sequentially in one thread and modify the state in order. For example, there is a List<Integer> called num in the benchmark class. What I want is: first, run add() to append a number into the list. Then, run remove() to remove the number from the list. The calling sequence must be add() --> remove() . If remove() runs before add() or they run concurrently, they would raise exceptions because there's no element in the list. That is, add() and