micro-optimization

Improving the Quick sort

痞子三分冷 提交于 2019-12-03 12:13:13
问题 If possible, how can I improve the following quick sort(performance wise). Any suggestions? void main() { quick(a,0,n-1); } void quick(int a[],int lower,int upper) { int loc; if(lower<upper) { loc=partition(a,lower,upper); quick(a,lower,loc-1); quick(a,loc+1,upper); } } /* Return type: int Parameters passed: Unsorted array and its lower and upper bounds */ int partition(int a[],int lower,int upper) { int pivot,i,j,temp; pivot=a[lower]; i=lower+1; j=upper; while(i<j) { while((i<upper)&&(a[i]<

Why are DateTime.Now DateTime.UtcNow so slow/expensive

蹲街弑〆低调 提交于 2019-12-03 10:46:50
I realize this is way too far into the micro-optimization area, but I am curious to understand why Calls to DateTime.Now and DateTime.UtcNow are so "expensive". I have a sample program that runs a couple of scenarios of doing some "work" (adding to a counter) and attempts to do this for 1 second. I have several approached of making it do the work for a limited quantity of time. The examples show that DateTime.Now and DateTime.UtcNow are significantly slower than Environment.TickCount, but even that is slow compared to just letting a separate thread sleep for 1 second and then setting a value

Does the order of case in Switch statement can vary the performance?

。_饼干妹妹 提交于 2019-12-03 09:57:10
Let say I have a switch statement as below switch(alphabet) { case "f": //do something break; case "c": //do something break; case "a": //do something break; case "e": //do something break; } Now suppose I know that the frequency of having Alphabet e is highest followed by a, c and f respectively. So, I just restructured the case statement order and made them as follows: switch(alphabet) { case "e": //do something break; case "a": //do something break; case "c": //do something break; case "f": //do something break; } Will the second switch statement be faster than the first switch statement?

Fast search of some nibbles in two ints at same offset (C, microoptimisation)

陌路散爱 提交于 2019-12-03 06:45:03
My task is to check (>trillions checks), does two int contain any of predefined pairs of nibbles (first pair 0x2 0x7; second 0xd 0x8). For example: bit offset: 12345678 first int: 0x3d542783 first pair of 0x2 second: 0xd second int: 0x486378d9 nibbles: 0x7 pair: 0x8 ^ ^ So, for this example I mark two offsets with needed pairs (offsets are 2 and 5; but not a 7). Actual offsets and number of found pair are not needed in my task. So, for given two ints the question is: Does them contains the any of these pairs of nibbles at the same offset. I checked my program, this part is the hottest place (

Java: if-return-if-return vs if-return-elseif-return

随声附和 提交于 2019-12-03 04:26:12
Asked an unrelated question where I had code like this: public boolean equals(Object obj) { if (this == obj) return true; if (obj == null) return false; if (getClass() != obj.getClass()) return false; // Check property values } I got a comment which claimed that this was not optimal, and that it instead (if I understood correctly) should do this: public boolean equals(Object obj) { if (this == obj) return true; else if (obj == null) return false; else if (getClass() != obj.getClass()) return false; // Check property values } Because of the return statements, I can't really see why any of them

Why does Intel's compiler prefer NEG+ADD over SUB?

你说的曾经没有我的故事 提交于 2019-12-03 04:06:21
问题 In examining the output of various compilers for a variety of code snippets, I've noticed that Intel's C compiler (ICC) has a strong tendency to prefer emitting a pair of NEG + ADD instructions where other compilers would use a single SUB instruction. As a simple example, consider the following C code: uint64_t Mod3(uint64_t value) { return (value % 3); } ICC translates this to the following machine code (regardless of optimization level): mov rcx, 0xaaaaaaaaaaaaaaab mov rax, rdi mul rcx shr

C++ Adding 2 arrays together quickly

坚强是说给别人听的谎言 提交于 2019-12-03 03:21:12
Given the arrays: int canvas[10][10]; int addon[10][10]; Where all the values range from 0 - 100, what is the fastest way in C++ to add those two arrays so each cell in canvas equals itself plus the corresponding cell value in addon? IE, I want to achieve something like: canvas += another; So if canvas[0][0] =3 and addon[0][0] = 2 then canvas[0][0] = 5 Speed is essential here as I am writing a very simple program to brute force a knapsack type problem and there will be tens of millions of combinations. And as a small extra question (thanks if you can help!) what would be the fastest way of

Why does n++ execute faster than n=n+1?

有些话、适合烂在心里 提交于 2019-12-03 02:42:31
问题 In C language, Why does n++ execute faster than n=n+1 ? (int n=...; n++;) (int n=...; n=n+1;) Our instructor asked that question in today's class. (this is not homework) 回答1: That would be true if you are working on a "stone-age" compiler... In case of "stone-age" : ++n is faster than n++ is faster than n=n+1 Machine usually have increment x as well as add const to x In case of n++ , you will have 2 memory access only (read n, inc n, write n ) In case of n=n+1 , you will have 3 memory access

Improving the Quick sort

人走茶凉 提交于 2019-12-03 02:37:28
If possible, how can I improve the following quick sort(performance wise). Any suggestions? void main() { quick(a,0,n-1); } void quick(int a[],int lower,int upper) { int loc; if(lower<upper) { loc=partition(a,lower,upper); quick(a,lower,loc-1); quick(a,loc+1,upper); } } /* Return type: int Parameters passed: Unsorted array and its lower and upper bounds */ int partition(int a[],int lower,int upper) { int pivot,i,j,temp; pivot=a[lower]; i=lower+1; j=upper; while(i<j) { while((i<upper)&&(a[i]<=pivot)) i++; while((a[j]>pivot)) j--; if(i<j) { temp=a[i]; a[i]=a[j]; a[j]=temp; } }//end while if

Verifying compiler optimizations in gcc/g++ by analyzing assembly listings

谁说我不能喝 提交于 2019-12-03 01:02:49
I just asked a question related to how the compiler optimizes certain C++ code , and I was looking around SO for any questions about how to verify that the compiler has performed certain optimizations. I was trying to look at the assembly listing generated with g++ ( g++ -c -g -O2 -Wa,-ahl=file.s file.c ) to possibly see what is going on under the hood, but the output is too cryptic to me. What techniques do people use to tackle this problem, and are there any good references on how to interpret the assembly listings of optimized code or articles specific to the GCC toolchain that talk about