micro-optimization

Alternative schemes for implementing vptr?

匆匆过客 提交于 2019-12-19 17:40:07
问题 This question is not about the C++ language itself(ie not about the Standard) but about how to call a compiler to implement alternative schemes for virtual function. The general scheme for implementing virtual functions is using a pointer to a table of pointers. class Base { private: int m; public: virtual metha(); }; equivalently in say C would be something like struct Base { void (**vtable)(); int m; } the first member is usually a pointer to a list of virtual functions, etc. (a piece of

Unset the most significant bit in a word (int32) [C]

拈花ヽ惹草 提交于 2019-12-19 03:22:25
问题 How can I unset the most significant setted bit of a word (e.g. 0x00556844 -> 0x00156844)? There is a __builtin_clz in gcc, but it just counts the zeroes, which is unneeded to me. Also, how should I replace __builtin_clz for msvc or intel c compiler? Current my code is int msb = 1<< ((sizeof(int)*8)-__builtin_clz(input)-1); int result = input & ~msb; UPDATE: Ok, if you says that this code is rather fast, I'll ask you, how should I add a portability to this code? This version is for GCC, but

Determine the optimal size for array with respect to the JVM's memory granularity

旧城冷巷雨未停 提交于 2019-12-19 03:15:51
问题 When creating the backing array for (e.g.) a collection, you do not really care about the exact size of the array you create, it only needs to be at least as large as you calculated. But thanks to the memory allocation and the VM's array header, it would in some cases be possible to create a somewhat larger array without consuming any more memory - for the Oracle 32 bit VM (at least thats what several sources on the internet claim), memory granularity is 8 (meaning any memory allocation is

Using bools in calculations to avoid branches

强颜欢笑 提交于 2019-12-18 15:53:02
问题 Here's a little micro-optimization curiosity that I came up with: struct Timer { bool running{false}; int ticks{0}; void step_versionOne(int mStepSize) { if(running) ticks += mStepSize; } void step_versionTwo(int mStepSize) { ticks += mStepSize * static_cast<int>(running); } }; It seems the two methods practically do the same thing. Does the second version avoid a branch (and consequently, is faster than the first version), or is any compiler able to do this kind of optimization with -O3 ?

Is there a performance overhead to a private inner class in Java?

情到浓时终转凉″ 提交于 2019-12-18 13:09:26
问题 When I have inner classes with private methods or fields the compiler has to create synthetic package-protected accessor methods to allow the outer class to access those private elements (and vice-versa). To avoid that, I usually make all fields and methods and constructors package-protected instead of private. But how about the visibility of the class itself? Is there an overhead to private static class A { A(){} } versus static class A { A(){} } Note that the constructor is package

Passing null pointer to placement new

孤人 提交于 2019-12-18 10:44:49
问题 The default placement new operator is declared in 18.6 [support.dynamic] ¶1 with a non-throwing exception-specification: void* operator new (std::size_t size, void* ptr) noexcept; This function does nothing except return ptr; so it is reasonable for it to be noexcept , however according to 5.3.4 [expr.new] ¶15 this means that the compiler must check it doesn't return null before invoking the object's constructor: -15- [ Note: unless an allocation function is declared with a non-throwing

Is not having local functions a micro optimisation?

纵饮孤独 提交于 2019-12-18 09:12:29
问题 Would moving the inner function outside of this one so that its not created everytime the function is called be a micro-optimisation? In this particular case the doMoreStuff function is only used inside doStuff . Should I worry about having local functions like these? function doStuff() { var doMoreStuff = function(val) { // do some stuff } // do something for (var i = 0; i < list.length; i++) { doMoreStuff(list[i]); for (var j = 0; j < list[i].children.length; j++) { doMoreStuff(list[i]

Fastest implementation of simple, virtual, observer-sort of, pattern in c++?

穿精又带淫゛_ 提交于 2019-12-18 04:23:21
问题 I'm working my arse off trying to implement an alternative for vtables using enums and a ton of macro magic that's really starting to mess with my brain. I'm starting to think i'm not walking the right path since the code is getting uglier and uglier, and will not be fit for production by any means. How can the pattern of the following code be implemented with the least amount of redirection/operations? It has to be done in standard c++, up to 17. class A{ virtual void Update() = 0; // A is

C pointers vs direct member access for structs

这一生的挚爱 提交于 2019-12-18 04:08:37
问题 Say I have a struct like the following ... typedef struct { int WheelCount; double MaxSpeed; } Vehicle; ... and I have a global variable of this type (I'm well aware of the pitfalls of globals, this is for an embedded system, which I didn't design, and for which they're an unfortunate but necessary evil.) Is it faster to access the members of the struct directly or through a pointer ? ie double LocalSpeed = MyGlobal.MaxSpeed; or double LocalSpeed = pMyGlobal->MaxSpeed; One of my tasks is to

Why does my application spend 24% of its life doing a null check?

旧城冷巷雨未停 提交于 2019-12-17 21:39:09
问题 I've got a performance critical binary decision tree, and I'd like to focus this question on a single line of code. The code for the binary tree iterator is below with the results from running performance analysis against it. public ScTreeNode GetNodeForState(int rootIndex, float[] inputs) { 0.2% ScTreeNode node = RootNodes[rootIndex].TreeNode; 24.6% while (node.BranchData != null) { 0.2% BranchNodeData b = node.BranchData; 0.5% node = b.Child2; 12.8% if (inputs[b.SplitInputIndex] <= b