benchmarking | 易学教程

Why is 2 * (i * i) faster than 2 * i * i in Java?

阅读更多关于 Why is 2 * (i * i) faster than 2 * i * i in Java?

The following Java program takes on average between 0.50 secs and 0.55 secs to run: public static void main(String[] args) { long startTime = System.nanoTime(); int n = 0; for (int i = 0; i < 1000000000; i++) { n += 2 * (i * i); } System.out.println((double) (System.nanoTime() - startTime) / 1000000000 + " s"); System.out.println("n = " + n); } If I replace 2 * (i * i) with 2 * i * i , it takes between 0.60 and 0.65 secs to run. How come? I ran each version of the program 15 times, alternating between the two. Here are the results: 2*(i*i) | 2*i*i ----------+---------- 0.5183738 | 0.6246434 0

How to benchmark memory usage of a function?

阅读更多关于 How to benchmark memory usage of a function?

问题 I notice that Rust's test has a benchmark mode that will measure execution time in ns/iter , but I could not find a way to measure memory usage. How would I implement such a benchmark? Let us assume for the moment that I only care about heap memory at the moment (though stack usage would also certainly be interesting). Edit: I found this issue which asks for the exact same thing. 回答1: With Rust 1.0 and 1.1 you could use the libc crate in order to print the jemalloc statistics: #![feature(libc

Capturing (externally) the memory consumption of a given Callback

阅读更多关于 Capturing (externally) the memory consumption of a given Callback

The Problem Lets say I have this function: function hog($i = 1) // uses $i * 0.5 MiB, returns $i * 0.25 MiB { $s = str_repeat('a', $i * 1024 * 512); return substr($s, $i * 1024 * 256); } I would like to call it and be able to inspect the maximum amount of memory it uses. In other words: memory_get_function_peak_usage($callback); . Is this possible? What I Have Tried I'm using the following values as my non-monotonically increasing $i argument for hog() : $iterations = array_merge(range(0, 50, 10), range(50, 0, 5)); $iterations = array_fill_keys($iterations, 0); Which is essentially: ( [0] => 0

Benchmarking affected by VCL

阅读更多关于 Benchmarking affected by VCL

Today I ported my old memory benchmark from Borland C++ builder 5.0 to BDS2006 Turbo C++ and found out weird thing. exe from BCB5 runs OK and stable exe from BDS2006 measure OK only before main Form is started (inside its constructor) and if the benchmark is started again after main form is Activated or even after any VCL component change (for example Caption of main form) then the speed of benchmark thread is strongly affected. After some research I found out that: Does not mater if test is inside thread or not. The process/thread priority,affinity does not affect this either. Hide of any

How to measure Disk Speed in Java for Benchmarking

阅读更多关于 How to measure Disk Speed in Java for Benchmarking

I would like to know how can you measure disk speed using Java API. Random read,sequential read and Random and sequential write. If someone thinks it's not a real question. Please explain so before closing it. Thanks You can take a look at a disk utility I wrote in java. It may not be super fancy but it works. https://sourceforge.net/projects/jdiskmark/ Here is a snippet of the write measurement code: try (RandomAccessFile rAccFile = new RandomAccessFile(testFile,mode)) { for (int b=0; b<numOfBlocks; b++) { if (App.randomEnable) { int rLoc = Util.randInt(0, numOfBlocks-1); rAccFile.seek(rLoc

Poor maths performance in C vs Python/numpy

阅读更多关于 Poor maths performance in C vs Python/numpy

Near-duplicate / related: How does BLAS get such extreme performance? (If you want fast matmul in C, seriously just use a good BLAS library unless you want to hand-tune your own asm version.) But that doesn't mean it's not interesting to see what happens when you compile less-optimized matrix code. how to optimize matrix multiplication (matmul) code to run fast on a single processor core Matrix Multiplication with blocks Out of interest, I decided to compare the performance of (inexpertly) handwritten C vs. Python/numpy performing a simple matrix multiplication of two, large, square matrices

How to set a variable that represents a time in the future in absolute terms Objective-C

阅读更多关于 How to set a variable that represents a time in the future in absolute terms Objective-C

Motivation: I'm working on an app that makes several (client) phones read audio data from a (server) phone. The idea is that they must all play the song at the exact same time together. Premise: I must figure out a way to make all phones start at a certain time stamp that is absolute (ie it's not relative to the use set clock of either phone etc..).. based on some research I figured the best way to do this is to use CFAbsoluteTimeGetCurrent(); The idea here is that I get the latency it takes the server to communicate with each phone (b/c GKSession is done serially not in parallel apparently),

Why is iterating though `std::vector` faster than iterating though `std::array`?

阅读更多关于 Why is iterating though `std::vector` faster than iterating though `std::array`?

I recently asked this question: Why is iterating an std::array much faster than iterating an std::vector? As people quickly pointed out, my benchmark had many flaws. So as I was trying to fix my benchmark, I noticed that std::vector wasn't slower than std::array and, in fact, it was quite the opposite. #include <vector> #include <array> #include <stdio.h> #include <chrono> using namespace std; constexpr int n = 100'000'000; vector<int> v(n); //array<int, n> v; int main() { int res = 0; auto start = chrono::steady_clock::now(); for(int x : v) res += x; auto end = chrono::steady_clock::now();

Java regex performance

阅读更多关于 Java regex performance

I'm trying to parse links with regex with Java. But I think it's getting too slow. For example, to extract all links from: http://news.google.com.ar/nwshp?hl=es&tab=wn ...it's spending 34642 milliseconds (34 seconds!!!) Here is the regex: private final String regexp = "<a.*?\\shref\\s*=\\s*([\\\"\\']*)(.*?)([\\\"\\'\\s].*?>|>)"; The flags for the pattern: private static final int flags = Pattern.CASE_INSENSITIVE | Pattern.DOTALL |Pattern.MULTILINE | Pattern.UNICODE_CASE | Pattern.CANON_EQ; And the code may be something like this: private void processURL(URL url){ URLConnection connection;

Same program faster on Linux than Windows — why?

阅读更多关于 Same program faster on Linux than Windows — why?

The solution to this was found in the question Executable runs faster on Wine than Windows -- why? Glibc's floor() is probably implemented in terms of system libraries. I have a very small C++ program (~100 lines) for a physics simulation. I have compiled it with gcc 4.6.1 on both Ubuntu Oneiric and Windows XP on the same computer. I used precisely the same command line options (same makefile). Strangely, on Ubuntu, the program finishes much faster than on Windows (~7.5 s vs 13.5 s). At this point I thought it's a compiler difference (despite using the same version). But even more strangely,