micro-optimization | 易学教程

How could this Java code be sped up?

阅读更多关于 How could this Java code be sped up?

I am trying to benchmark how fast can Java do a simple task: read a huge file into memory and then perform some meaningless calculations on the data. All types of optimizations count. Whether it's rewriting the code differently or using a different JVM, tricking JIT .. Input file is a 500 million long list of 32 bit integer pairs separated by a comma. Like this: 44439,5023 33140,22257 ... This file takes 5.5GB on my machine. The program can't use more than 8GB of RAM and can use only a single thread . package speedracer; import java.io.FileInputStream; import java.nio.MappedByteBuffer; import

More efficient way to loop?

阅读更多关于 More efficient way to loop?

I have a small piece of code from a much larger script. I figured out that when the function t_area is called, it is responsible for most of the run time. I tested the function by itself, and it is not slow, it takes a lot of time because of the number of times that it has to be ran I believe. Here is the code where the function is called: tri_area = np.zeros((numx,numy),dtype=float) for jj in range(0,numy-1): for ii in range(0,numx-1): xp = x[ii,jj] yp = y[ii,jj] zp = surface[ii,jj] ap = np.array((xp,yp,zp)) xp = xp+dx zp = surface[ii+1,jj] bp = np.array((xp,yp,zp)) yp = yp+dx zp = surface[ii

According to Intel my cache should be 24-way associative though its 12-way, how is that?

阅读更多关于 According to Intel my cache should be 24-way associative though its 12-way, how is that?

According to “Intel 64 and IA-32 architectures optimization reference manual,” April 2012 page 2-23 The physical addresses of data kept in the LLC data arrays are distributed among the cache slices by a hash function, such that addresses are uniformly distributed. The data array in a cache block may have 4/8/12/16 ways corresponding to 0.5M/1M/1.5M/2M block size. However, due to the address distribution among the cache blocks from the software point of view, this does not appear as a normal N-way cache. My computer is a 2-core Sandy Bridge with a 3 MB, 12-way set associative LLC cache. That

Faster implementation of Math.round?

阅读更多关于 Faster implementation of Math.round?

Are there any drawbacks to this code, which appears to be a faster (and correct) version of java.lang.Math.round ? public static long round(double d) { if (d > 0) { return (long) (d + 0.5d); } else { return (long) (d - 0.5d); } } It takes advantage of the fact that, in Java, truncating to long rounds in to zero. There are some special cases which the built in method handles, which your code does not handle. From the documentation: If the argument is NaN , the result is 0. If the argument is negative infinity or any value less than or equal to the value of Integer.MIN_VALUE , the result is

Is it possible to check if any of 2 sets of 3 ints is equal with less than 9 comparisons?

阅读更多关于 Is it possible to check if any of 2 sets of 3 ints is equal with less than 9 comparisons?

int eq3(int a, int b, int c, int d, int e, int f){ return a == d || a == e || a == f || b == d || b == e || b == f || c == d || c == e || c == f; } This function receives 6 ints and returns true if any of the 3 first ints is equal to any of the 3 last ints. Is there any bitwise-hack similar way to make it faster? Expanding on dawg's SSE comparison method, you can combine the results of the comparisons using a vector OR, and move a mask of the compare results back to an integer to test for 0 / non-zero. Also, you can get data into vectors more efficiently (although it's still pretty clunky to

How should I return multiple variables in a function (for best practices)?

阅读更多关于 How should I return multiple variables in a function (for best practices)?

Just curious to know what the best practice would be for something like this: A function, that returns multiple variables - how should one return these variables? like this (globalizing): function myfun(){ global $var1,$var2,$var3; $var1="foo"; $var2="foo"; $var3="foo"; }//end of function or like this (returning an array): function myfun(){ $var1="foo"; $var2="foo"; $var3="foo"; $ret_var=array("var1"=>$var1,"var2"=>$var2,"var3"=>$var3); return $ret_var; }//end of function I done a performance test, and it looks like using arrays is faster (after a few refreshes): array took: 5.9999999999505E-6

Does the order of case in Switch statement can vary the performance?

阅读更多关于 Does the order of case in Switch statement can vary the performance?

问题 Let say I have a switch statement as below switch(alphabet) { case "f": //do something break; case "c": //do something break; case "a": //do something break; case "e": //do something break; } Now suppose I know that the frequency of having Alphabet e is highest followed by a, c and f respectively. So, I just restructured the case statement order and made them as follows: switch(alphabet) { case "e": //do something break; case "a": //do something break; case "c": //do something break; case "f"

One instruction to clear PF (Parity Flag) — get odd number of bits in result register

阅读更多关于 One instruction to clear PF (Parity Flag) — get odd number of bits in result register

In x86 assembly, is it possible to clear the Parity Flag in one and only one instruction, working under any initial register configuration? This is equivalent to creating a result register with an odd number of bits, with any operation that sets flags (expressly excluding mov ). For contrast, setting the parity flag can be done in one instruction: cmp bl, bl And there are many ways to clear the parity flag with two instructions: and bl, 0 or bl, 1 However, the one-instruction method remains elusive. Not possible. None of the PF-changing commands can unconditionally produce an odd-parity result

Why compiled lambda build over Expression.Call is slightly slower than delegate that should do the same?

阅读更多关于 Why compiled lambda build over Expression.Call is slightly slower than delegate that should do the same?

Why compiled lambda build over Expression.Call is slightly slower than delegate that should do the same? And how to avoid it? Explaining BenchmarkDotNet results. We are comparing CallBuildedReal vs CallLambda ; others two CallBuilded and CallLambdaConst are "subforms" of CallLambda and shows the equal numbers. But difference with CallBuildedReal is significal. //[Config(typeof(Config))] [RankColumn, MinColumn, MaxColumn, StdDevColumn, MedianColumn] [ClrJob , CoreJob] [HtmlExporter, MarkdownExporter] [MemoryDiagnoser /*, InliningDiagnoser*/] public class BenchmarkCallSimple { static Func

C++ Adding 2 arrays together quickly

阅读更多关于 C++ Adding 2 arrays together quickly

问题 Given the arrays: int canvas[10][10]; int addon[10][10]; Where all the values range from 0 - 100, what is the fastest way in C++ to add those two arrays so each cell in canvas equals itself plus the corresponding cell value in addon? IE, I want to achieve something like: canvas += another; So if canvas[0][0] =3 and addon[0][0] = 2 then canvas[0][0] = 5 Speed is essential here as I am writing a very simple program to brute force a knapsack type problem and there will be tens of millions of

订阅 micro-optimization