micro-optimization

can array access be optimized?

空扰寡人 提交于 2019-12-04 09:34:55
Maybe I'm being misled by my profiler (Netbeans), but I'm seeing some odd behavior, hoping maybe someone here can help me understand it. I am working on an application, which makes heavy use of rather large hash tables (keys are longs, values are objects). The performance with the built in java hash table (HashMap specifically) was very poor, and after trying some alternatives -- Trove, Fastutils, Colt, Carrot -- I started working on my own. The code is very basic using a double hashing strategy. This works fine and good and shows the best performance of all the other options I've tried thus

What is faster in Python, “while” or “for xrange”

坚强是说给别人听的谎言 提交于 2019-12-04 08:50:20
We can do numeric iteration like: for i in xrange(10): print i, and in C-style: i = 0 while i < 10: print i, i = i + 1 Yes, I know, the first one is less error-prone, more pythonic but is it fast enough as C-style version? PS. I'm from C++ planet and pretty new on Python one. I am sure the while version is slower. Python will have to lookup the add operation for the integer object on each turn of the loop etc, it is not pure C just because it looks like it! And if you want a pythonic version of exactly the above, use: print " ".join(str(i) for i in xrange(10)) Edit: My timings look like this.

Use of lazy val for caching string representation

半城伤御伤魂 提交于 2019-12-04 06:14:29
I encountered the following code in JAXMag's Scala special issue: package com.weiglewilczek.gameoflife case class Cell(x: Int, y: Int) { override def toString = position private lazy val position = "(%s, %s)".format(x, y) } Does the use of lazy val in the above code provide considerably more performance than the following code? package com.weiglewilczek.gameoflife case class Cell(x: Int, y: Int) { override def toString = "(%s, %s)".format(x, y) } Or is it just a case of unnecessary optimization? One thing to note about lazy vals is that, while they are only calculated once, every access to

How to get lg2 of a number that is 2^k

╄→尐↘猪︶ㄣ 提交于 2019-12-04 03:31:10
What is the best solution for getting the base 2 logarithm of a number that I know is a power of two ( 2^k ). (Of course I know only the value 2^k not k itself.) One way I thought of doing is by subtracting 1 and then doing a bitcount: lg2(n) = bitcount( n - 1 ) = k, iff k is an integer 0b10000 - 1 = 0b01111, bitcount(0b01111) = 4 But is there a faster way of doing it (without caching)? Also something that doesn't involve bitcount about as fast would be nice to know? One of the applications this is: suppose you have bitmask 0b0110111000 and value 0b0101010101 and you are interested of (value &

what's the difference between _mm256_lddqu_si256 and _mm256_loadu_si256

本小妞迷上赌 提交于 2019-12-04 03:21:47
问题 I had been using _mm256_lddqu_si256 based on an example I found online. Later I discovered _mm256_loadu_si256 . The Intel Intrinsics guide only states that the lddqu version may perform better when crossing a cache line boundary. What might be the advantages of loadu ? In general how are these functions different? 回答1: There's no reason to ever use _mm256_lddqu_si256 , consider it a synonym for _mm256_loadu_si256 . lddqu only exists for historical reasons as x86 evolved towards having better

Does calling the constructor of an empty class actually use any memory?

我是研究僧i 提交于 2019-12-04 03:08:10
Suppose I have a class like class Empty{ Empty(int a){ cout << a; } } And then I invoke it using int main(){ Empty(2); return 0; } Will this cause any memory to be allocated on the stack for the creation of an "Empty" object? Obviously, the arguments need to be pushed onto the stack, but I don't want to incur any extra overhead. Basically I am using the constructor as a static member. The reason I want to do this is because of templates. The actual code looks like template <int which> class FuncName{ template <class T> FuncName(const T &value){ if(which == 1){ // specific behavior }else if

std::vector-like class optimized to hold a small number of items [duplicate]

隐身守侯 提交于 2019-12-03 22:34:38
This question already has answers here : Closed 4 years ago . small string optimization for vector? (4 answers) In one time-critical part of the program there is a member of the class that looks like that: std::vector m_vLinks; During profiling I noticed that about 99.98% of executions this vector holds only 0 or 1 items. However in very rarely cases it might hold more. This vector is definitely a bottleneck according to profiler, so I'm thinking about following optimization: Craft a hand-made class with vector-like interface This class will hold true size, one item and optional pointer to the

Fast search and replace some nibble in int [c; microoptimisation]

廉价感情. 提交于 2019-12-03 16:00:57
This is variant of Fast search of some nibbles in two ints at same offset (C, microoptimisation) question with different task: The task is to find a predefined nibble in int32 and replace it with other nibble. For example, nibble to search is 0x5; nibble to replace with is 0xe: int: 0x3d542753 (input) ^ ^ output:0x3dE427E3 (output int) There can be other pair of nibble to search and nibble to replace (known at compile time). I checked my program, this part is one of most hot place (gprof proven, 75% of time is in the function); and it is called a very-very many times (gcov proven). Actually it

Java: if-return-if-return vs if-return-elseif-return

血红的双手。 提交于 2019-12-03 14:40:43
问题 Asked an unrelated question where I had code like this: public boolean equals(Object obj) { if (this == obj) return true; if (obj == null) return false; if (getClass() != obj.getClass()) return false; // Check property values } I got a comment which claimed that this was not optimal, and that it instead (if I understood correctly) should do this: public boolean equals(Object obj) { if (this == obj) return true; else if (obj == null) return false; else if (getClass() != obj.getClass()) return

Smart JVM and JIT Micro-Optimizations

◇◆丶佛笑我妖孽 提交于 2019-12-03 13:34:40
问题 Over time, Sun's JVM and JIT have gotten pretty smart. Things that used to be common knowledge as being a necessary micro-optimization are no longer needed, because it gets taken care of for you. For example, it used to be the case that you should mark all possible classes as final, so the JVM inlines as much code as possible. However now, the JIT knows whether your class is final based on what classes get loaded in at runtime, and if you load a class to make the original one non-final-able,