micro-optimization | 易学教程

Is thread time spent in synchronization too high?

阅读更多关于 Is thread time spent in synchronization too high?

问题 Today I profiled one of my C# applications using the Visual Studio 2010 Performance Analyzer. Specifically, I was profiling for " Concurrency " because it seemed as though my app should have more capacity then it was demonstrating. The analysis report showed that the threads were spending ~70-80% of their time in a Synchronization state. To be honest, I'm not sure what this means. Does this mean that the application is suffering from a live-lock condition? For context... there are ~30+ long

boost::thread data structure sizes on the ridiculous side?

阅读更多关于 boost::thread data structure sizes on the ridiculous side?

Compiler: clang++ x86-64 on linux. It has been a while since I have written any intricate low level system code, and I ussualy program against the system primitives (windows and pthreads/posix). So, the in#s and out's have slipped from my memory. I am working with boost::asio and boost::thread at the moment. In order to emulate synchronous RPC against an asynchronous function executor ( boost::io_service with multiple threads io::service::run 'ing where requests are io_serviced::post 'ed), I am using boost synchronization primitives. For curiosities sake I decided to sizeof the primitives.

Unset the most significant bit in a word (int32) [C]

阅读更多关于 Unset the most significant bit in a word (int32) [C]

How can I unset the most significant setted bit of a word (e.g. 0x00556844 -> 0x00156844)? There is a __builtin_clz in gcc, but it just counts the zeroes, which is unneeded to me. Also, how should I replace __builtin_clz for msvc or intel c compiler? Current my code is int msb = 1<< ((sizeof(int)*8)-__builtin_clz(input)-1); int result = input & ~msb; UPDATE: Ok, if you says that this code is rather fast, I'll ask you, how should I add a portability to this code? This version is for GCC, but MSVC & ICC? Just round down to the nearest power of 2 and then XOR that with the original value, e.g.

Determine the optimal size for array with respect to the JVM's memory granularity

阅读更多关于 Determine the optimal size for array with respect to the JVM's memory granularity

When creating the backing array for (e.g.) a collection, you do not really care about the exact size of the array you create, it only needs to be at least as large as you calculated. But thanks to the memory allocation and the VM's array header, it would in some cases be possible to create a somewhat larger array without consuming any more memory - for the Oracle 32 bit VM (at least thats what several sources on the internet claim), memory granularity is 8 (meaning any memory allocation is rounded up to the next 8 byte-boundary), and array header overhead is 12 bytes. That means when

Are there any performance test results for usage of likely/unlikely hints?

阅读更多关于 Are there any performance test results for usage of likely/unlikely hints?

gcc features likely/unlikely hints that help the compiler to generate machine code with better branch prediction. Is there any data on how proper usage or failure to use those hints affects performance of real code on some real systems? MSalters The question differs, but Peter Cordes's answer on this question gives a clear hint ;) . Modern CPU's ignore static hints and use dynamic branch prediction. I don't know of any thorough analysis of such particular hints. In any case, it would be extremely CPU-specific. In general, if you are sure about the likelyhood (e.g., > 90%) then it is probably

boost::thread data structure sizes on the ridiculous side?

阅读更多关于 boost::thread data structure sizes on the ridiculous side?

问题 Compiler: clang++ x86-64 on linux. It has been a while since I have written any intricate low level system code, and I ussualy program against the system primitives (windows and pthreads/posix). So, the in#s and out's have slipped from my memory. I am working with boost::asio and boost::thread at the moment. In order to emulate synchronous RPC against an asynchronous function executor ( boost::io_service with multiple threads io::service::run 'ing where requests are io_serviced::post 'ed), I

How to: Inline assembler in C++ (under Visual Studio 2010)

阅读更多关于 How to: Inline assembler in C++ (under Visual Studio 2010)

问题 I'm writing a performance-critical, number-crunching C++ project where 70% of the time is used by the 200 line core module. I'd like to optimize the core using inline assembly, but I'm completely new to this. I do, however, know some x86 assembly languages including the one used by GCC and NASM. All I know: I have to put the assembler instructions in _asm{} where I want them to be. Problem: I have no clue where to start. What is in which register at the moment my inline assembly comes into

How to: Inline assembler in C++ (under Visual Studio 2010)

阅读更多关于 How to: Inline assembler in C++ (under Visual Studio 2010)

I'm writing a performance-critical, number-crunching C++ project where 70% of the time is used by the 200 line core module. I'd like to optimize the core using inline assembly, but I'm completely new to this. I do, however, know some x86 assembly languages including the one used by GCC and NASM. All I know: I have to put the assembler instructions in _asm{} where I want them to be. Problem: I have no clue where to start. What is in which register at the moment my inline assembly comes into play? You can access variables by their name and copy them to registers. Here's an example from MSDN: int

Difference between “or eax,eax” and “test eax,eax” [duplicate]

阅读更多关于 Difference between “or eax,eax” and “test eax,eax” [duplicate]

问题 This question already has answers here : Test whether a register is zero with CMP reg,0 vs OR reg,reg? (2 answers) Closed last year . What's the difference between or eax,eax and test eax,eax ? I've seen different compilers produce both for the same comparison and as far as documentation goes they do exactly the same thing, so I'm wondering why they don't all use test eax,eax . Thinking about it and eax,eax would set the flags in an identical fashion as either but I haven't seen it in either

x > -1 vs x >= 0, is there a performance difference

阅读更多关于 x > -1 vs x >= 0, is there a performance difference

I have heard a teacher drop this once, and it has been bugging me ever since. Let's say we want to check if the integer x is bigger than or equal to 0. There are two ways to check this: if (x > -1){ //do stuff } and if (x >= 0){ //do stuff } According to this teacher > would be slightly faster then >= . In this case it was Java, but according to him this also applied for C, c++ and other languages. Is there any truth to this statement? There's no difference in any real-world sense. Let's take a look at some code generated by various compilers for various targets. I'm assuming a signed int