micro-optimization

Is it possible to check if any of 2 sets of 3 ints is equal with less than 9 comparisons?

放肆的年华 提交于 2019-12-06 20:16:14
问题 int eq3(int a, int b, int c, int d, int e, int f){ return a == d || a == e || a == f || b == d || b == e || b == f || c == d || c == e || c == f; } This function receives 6 ints and returns true if any of the 3 first ints is equal to any of the 3 last ints. Is there any bitwise-hack similar way to make it faster? 回答1: Expanding on dawg's SSE comparison method, you can combine the results of the comparisons using a vector OR, and move a mask of the compare results back to an integer to test

How should I return multiple variables in a function (for best practices)?

青春壹個敷衍的年華 提交于 2019-12-06 20:12:06
问题 Just curious to know what the best practice would be for something like this: A function, that returns multiple variables - how should one return these variables? like this (globalizing): function myfun(){ global $var1,$var2,$var3; $var1="foo"; $var2="foo"; $var3="foo"; }//end of function or like this (returning an array): function myfun(){ $var1="foo"; $var2="foo"; $var3="foo"; $ret_var=array("var1"=>$var1,"var2"=>$var2,"var3"=>$var3); return $ret_var; }//end of function I done a performance

On improving Haskell's performance compared to C in fibonacci micro-benchmark

谁说我不能喝 提交于 2019-12-06 17:56:19
问题 I came across this question, which compared the performance of various compilers on computing fibonaci numbers the naive way. I tried doing this with Haskell to see how it compares to C. C code: #include <stdio.h> #include <stdlib.h> int fib (int n) { if (n < 2) return 1; return fib (n-1) + fib (n-2); } int main (int argc, char* argv[]) { printf ("%i\n", fib (atoi(argv[1]))); return 0; } Result: > gcc -O3 main.c -o fib > time ./fib 40 165580141 real 0m0.421s user 0m0.420s sys 0m0.000s Haskell

PHP micro-optimization

本秂侑毒 提交于 2019-12-06 11:14:21
How can I spot useless micro-optimization techniques? What should be avoided? Any optimization done without being measured and profiled first is useless. PHP code profilers: xDebug PHP_Debug time (Sometimes it is easy to spot bottlenecks in the code using a simple echo time() ) Always measure before optimizing! Write code that works and is readable. If you find it sluggish, you can always do some profiling . I'm making myself unpopular and say isset . To check for undefined variables isset() is often used throughout application logic. Many people however only use it with the intent to suppress

PHP null and copy-on-write

流过昼夜 提交于 2019-12-06 11:08:52
Suppose I want to have two variables and have them both equal to null . (More realistically, I am thinking about an array that contains a large amount of null s, but the "two variables" scenario is sufficient for the question.) Obviously, I can do this in more than one way. I can do this (method 1): $a = null; $b = $a; By my understanding, the result of this is that there is one zval that is pointed to by two entries in the symbol table: 'a' and 'b' . But alternatively one might do this (method 2): $a = null; $b = null; Naively one would expect that this should result in two different zvals,

One instruction to clear PF (Parity Flag) — get odd number of bits in result register

房东的猫 提交于 2019-12-06 08:13:17
问题 In x86 assembly, is it possible to clear the Parity Flag in one and only one instruction, working under any initial register configuration? This is equivalent to creating a result register with an odd number of bits, with any operation that sets flags (expressly excluding mov ). For contrast, setting the parity flag can be done in one instruction: cmp bl, bl And there are many ways to clear the parity flag with two instructions: and bl, 0 or bl, 1 However, the one-instruction method remains

Why compiled lambda build over Expression.Call is slightly slower than delegate that should do the same?

本秂侑毒 提交于 2019-12-06 07:55:08
问题 Why compiled lambda build over Expression.Call is slightly slower than delegate that should do the same? And how to avoid it? Explaining BenchmarkDotNet results. We are comparing CallBuildedReal vs CallLambda ; others two CallBuilded and CallLambdaConst are "subforms" of CallLambda and shows the equal numbers. But difference with CallBuildedReal is significal. //[Config(typeof(Config))] [RankColumn, MinColumn, MaxColumn, StdDevColumn, MedianColumn] [ClrJob , CoreJob] [HtmlExporter,

can array access be optimized?

最后都变了- 提交于 2019-12-06 06:18:33
问题 Maybe I'm being misled by my profiler (Netbeans), but I'm seeing some odd behavior, hoping maybe someone here can help me understand it. I am working on an application, which makes heavy use of rather large hash tables (keys are longs, values are objects). The performance with the built in java hash table (HashMap specifically) was very poor, and after trying some alternatives -- Trove, Fastutils, Colt, Carrot -- I started working on my own. The code is very basic using a double hashing

Avoiding AVX-SSE (VEX) Transition Penalties

荒凉一梦 提交于 2019-12-06 06:01:40
问题 Our 64-bit application has lots of code (inter alia, in standard libraries) that use xmm0-xmm7 registers in SSE mode. I would like to implement fast memory copy using ymm registers. I cannot modify all the code that uses xmm registers to add VEX prefix, and I also think that this is not practical, since it will increase the size of the code can make it run slower because of the need for the CPU to decode larger instructions. I just want to use two ymm registers (and possibly zmm - the

How do I reduce execution time and number of cycles for a factorial loop? And/or code-size?

我只是一个虾纸丫 提交于 2019-12-06 05:54:45
Basically I'm having a hard time getting the execution time any lower than it is, as well as reducing the amount of clock cycles and memory size. Does anyone have any idea on how I can do this? The code works fine I just want to change it a bit. Wrote a working code, but don't want to mess up the code, but also don't know what changes to make. ; Calculation of a factorial value using a simple loop ; set up the exception addresses THUMB AREA RESET, CODE, READONLY EXPORT __Vectors EXPORT Reset_Handler __Vectors DCD 0x00180000 ; top of the stack DCD Reset_Handler ; reset vector - where the