benchmarking

LINQ Ring: Any() vs Contains() for Huge Collections

心已入冬 提交于 2019-11-26 20:28:07
Given a huge collection of objects, is there a performance difference between the the following? Collection.Contains : myCollection.Contains(myElement) Enumerable.Any : myCollection.Any(currentElement => currentElement == myElement) Contains() is an instance method, and its performance depends largely on the collection itself. For instance, Contains() on a List is O(n), while Contains() on a HashSet is O(1). Any() is an extension method, and will simply go through the collection, applying the delegate on every object. It therefore has a complexity of O(n). Any() is more flexible however since

How can I benchmark C code easily?

一笑奈何 提交于 2019-11-26 19:37:56
Is there a simple library to benchmark the time it takes to execute a portion of C code? What I want is something like: int main(){ benchmarkBegin(0); //Do work double elapsedMS = benchmarkEnd(0); benchmarkBegin(1) //Do some more work double elapsedMS2 = benchmarkEnd(1); double speedup = benchmarkSpeedup(elapsedMS, elapsedMS2); //Calculates relative speedup } It would also be great if the library let you do many runs, averaging them and calculating the variance in timing! Joe Basically, all you want is a high resolution timer. The elapsed time is of course just a difference in times and the

Benchmark Linq2SQL, Subsonic2, Subsonic3 - Any other ideas to make them faster?

不羁的心 提交于 2019-11-26 19:08:51
I am working with Subsonic 2 more than 3 years now... After Linq appears and then Subsonic 3, I start thinking about moving to the new Linq futures that are connected to sql. I must say that I start move and port my subsonic 2 with SubSonic 3, and very soon I discover that the speed was so slow thats I didn't believe it - and starts all that tests. Then I test Linq2Sql and see also a delay - compare it with Subsonic 2. My question here is, especial for the linq2sql, and the up-coming dotnet version 4, what else can I do to speed it up ? What else on linq2sql settings, or classes, not on this

How does jsPerf work?

冷暖自知 提交于 2019-11-26 18:55:48
问题 Today I visited jsPerf and now I am wondering… What is “ops/sec”? How many iterations does it do? On what basis does it calculate which is faster? What is the formula behind these calculations? Example: http://jsperf.com/concatenation-vs-join Can anyone tell me? Thanks in advance. 回答1: I wrote Benchmark.js, which jsPerf uses. " ops/sec " stands for operations per second. That is how many times a test is projected to execute in a second. A test is repeatedly executed until it reaches the

How do I get monotonic time durations in python?

匆匆过客 提交于 2019-11-26 18:41:29
I want to log how long something takes in real walltime. Currently I'm doing this: startTime = time.time() someSQLOrSomething() print "That took %.3f seconds" % (time.time() - startTime) But that will fail (produce incorrect results) if the time is adjusted while the SQL query (or whatever it is) is running. I don't want to just benchmark it. I want to log it in a live application in order to see trends on a live system. I want something like clock_gettime(CLOCK_MONOTONIC,...), but in Python. And preferably without having to write a C module that calls clock_gettime(). Armin Ronacher That

Why is looping over range() in Python faster than using a while loop?

穿精又带淫゛_ 提交于 2019-11-26 18:28:42
The other day I was doing some Python benchmarking and I came across something interesting. Below are two loops that do more or less the same thing. Loop 1 takes about twice as long as loop 2 to execute. Loop 1: int i = 0 while i < 100000000: i += 1 Loop 2: for n in range(0,100000000): pass Why is the first loop so much slower? I know it's a trivial example but it's piqued my interest. Is there something special about the range() function that makes it more efficient than incrementing a variable the same way? kcwu see the disassembly of python byte code, you may get a more concrete idea use

Timing CUDA operations

一个人想着一个人 提交于 2019-11-26 18:12:30
问题 I need to time a CUDA kernel execution. The Best Practices Guide says that we can use either events or standard timing functions like clock() in Windows. My problem is that using these two functions gives me a totally different result. In fact, the result given by events seems to be huge compared to the actual speed in practice. What I actually need all this for is to be able to predict the running time of a computation by first running a reduced version of it on a smaller data set.

How to use clock() in C++

自作多情 提交于 2019-11-26 18:10:29
How do I call clock() in C++ ? For example, I want to test how much time a linear search takes to find a given element in an array. #include <iostream> #include <cstdio> #include <ctime> int main() { std::clock_t start; double duration; start = std::clock(); /* Your algorithm here */ duration = ( std::clock() - start ) / (double) CLOCKS_PER_SEC; std::cout<<"printf: "<< duration <<'\n'; } An alternative solution, which is portable and with higher precision, available since C++11, is to use std::chrono . Here is an example: #include <iostream> #include <chrono> typedef std::chrono::high

How can I accurately benchmark unaligned access speed on x86_64

情到浓时终转凉″ 提交于 2019-11-26 17:48:51
In an answer , I've stated that unaligned access has almost the same speed as aligned access a long time (on x86/x86_64). I didn't have any numbers to back up this statement, so I've created a benchmark for it. Do you see any flaws in this benchmark? Can you improve on it (I mean, to increase GB/sec, so it reflects the truth better)? #include <sys/time.h> #include <stdio.h> template <int N> __attribute__((noinline)) void loop32(const char *v) { for (int i=0; i<N; i+=160) { __asm__ ("mov (%0), %%eax" : : "r"(v) :"eax"); __asm__ ("mov 0x04(%0), %%eax" : : "r"(v) :"eax"); __asm__ ("mov 0x08(%0),

Python Requests vs PyCurl Performance

霸气de小男生 提交于 2019-11-26 17:31:57
问题 How does the Requests library compare with the PyCurl performance wise? My understanding is that Requests is a python wrapper for urllib whereas PyCurl is a python wrapper for libcurl which is native, so PyCurl should get better performance, but not sure by how much. I can't find any comparing benchmarks. 回答1: I wrote you a full benchmark , using a trivial Flask application backed by gUnicorn/meinheld + nginx (for performance and HTTPS), and seeing how long it takes to complete 10,000