micro-optimization

Is there a tool to test the conciseness of c program? [closed]

淺唱寂寞╮ 提交于 2019-12-02 20:19:21
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 3 years ago . For example I want to check whether the following code can be more concise or not: for(i = 0; i < map->size; i++){ if(0 < map->bucket[i].n){ p = map->bucket[i].list; while(p){ h = hash(p->key) % n; if(bucket[h].list){ new_p = bucket[h].list; while(new_p->next)new_p = new_p->next; new_p->next = p; next = p->next;

How to force GCC to assume that a floating-point expression is non-negative?

人走茶凉 提交于 2019-12-02 17:50:42
There are cases where you know that a certain floating-point expression will always be non-negative. For example, when computing the length of a vector, one does sqrt(a[0]*a[0] + ... + a[N-1]*a[N-1]) (NB: I am aware of std::hypot , this is not relevant to the question), and the expression under the square root is clearly non-negative. However, GCC outputs the following assembly for sqrt(x*x) : mulss xmm0, xmm0 pxor xmm1, xmm1 ucomiss xmm1, xmm0 ja .L10 sqrtss xmm0, xmm0 ret .L10: jmp sqrtf That is, it compares the result of x*x to zero, and if the result is non-negative, it does the sqrtss

Why does Intel's compiler prefer NEG+ADD over SUB?

末鹿安然 提交于 2019-12-02 17:25:07
In examining the output of various compilers for a variety of code snippets, I've noticed that Intel's C compiler (ICC) has a strong tendency to prefer emitting a pair of NEG + ADD instructions where other compilers would use a single SUB instruction. As a simple example, consider the following C code: uint64_t Mod3(uint64_t value) { return (value % 3); } ICC translates this to the following machine code (regardless of optimization level): mov rcx, 0xaaaaaaaaaaaaaaab mov rax, rdi mul rcx shr rdx, 1 lea rsi, QWORD PTR [rdx+rdx*2] neg rsi ; \ equivalent to: add rdi, rsi ; / sub rdi, rsi mov rax,

Why does n++ execute faster than n=n+1?

孤街醉人 提交于 2019-12-02 16:15:23
In C language, Why does n++ execute faster than n=n+1 ? (int n=...; n++;) (int n=...; n=n+1;) Our instructor asked that question in today's class. (this is not homework) Betamoo That would be true if you are working on a "stone-age" compiler... In case of "stone-age" : ++n is faster than n++ is faster than n=n+1 Machine usually have increment x as well as add const to x In case of n++ , you will have 2 memory access only (read n, inc n, write n ) In case of n=n+1 , you will have 3 memory access (read n, read const, add n and const, write n) But today's compiler will automatically convert n=n+1

Multiplication with constant - imul or shl-add-combination

故事扮演 提交于 2019-12-02 02:22:13
This question is about how we multiply an integer with a constant. So let's look at a simple function: int f(int x) { return 10*x; } How can that function be optimized best, especially when inlined into a caller? Approach 1 (produced by most optimizing compilers (e.g. on Godbolt )) lea (%rdi,%rdi,4), %eax add %eax, %eax Approach 2 (produced with clang3.6 and earlier, with -O3) imul $10, %edi, %eax Approach 3 (produced with g++6.2 without optimization, removing stores/reloads) mov %edi, %eax sal $2, %eax add %edi, %eax add %eax, %eax Which version is fastest, and why? Primarily interested in

How much memory instance of my class uses - pragmatic answer

别说谁变了你拦得住时间么 提交于 2019-12-01 23:49:59
问题 How big is instance of following class after constructor is called? I guess this can be written generally as size = nx + c, where x = 4 in x86, and x = 8 in x64. n = ? c = ? Is there some method in .NET which can return this number? class Node { byte[][] a; int[] b; List<Node> c; public Node() { a = new byte[3][]; b = new int[3]; c = new List<Node>(0); } } 回答1: First of all this depends on environment where this program is compiled and run, but if you fix some variables you can get pretty

Branch on ?: operator?

烈酒焚心 提交于 2019-12-01 16:19:25
For a typical modern compiler on modern hardware, will the ? : operator result in a branch that affects the instruction pipeline? In other words which is faster, calling both cases to avoid a possible branch: bool testVar = someValue(); // Used later. purge(white); purge(black); or picking the one actually needed to be purged and only doing it with an operator ?: : bool testVar = someValue(); purge(testVar ? white : black); I realize you have no idea how long purge() will take, but I'm just asking a general question here about whether I would ever want to call purge() twice to avoid a possible

How much faster are SSE4.2 string instructions than SSE2 for memcmp?

点点圈 提交于 2019-12-01 10:46:44
Here is my code's assembler Can you embed it in c ++ and check against SSE4? At speed I would very much like to see how stepped into the development of SSE4. Or is not worried about him at all? Let's check (I do not have support above SSSE3) { sse2 strcmp WideChar 32 bit } function CmpSee2(const P1, P2: Pointer; len: Integer): Boolean; asm push ebx // Create ebx cmp EAX, EDX // Str = Str2 je @@true // to exit true test eax, eax // not Str je @@false // to exit false test edx, edx // not Str2 je @@false // to exit false sub edx, eax // Str2 := Str2 - Str; mov ebx, [eax] // get Str 4 byte xor

How much faster are SSE4.2 string instructions than SSE2 for memcmp?

心不动则不痛 提交于 2019-12-01 08:20:58
问题 Here is my code's assembler Can you embed it in c ++ and check against SSE4? At speed I would very much like to see how stepped into the development of SSE4. Or is not worried about him at all? Let's check (I do not have support above SSSE3) { sse2 strcmp WideChar 32 bit } function CmpSee2(const P1, P2: Pointer; len: Integer): Boolean; asm push ebx // Create ebx cmp EAX, EDX // Str = Str2 je @@true // to exit true test eax, eax // not Str je @@false // to exit false test edx, edx // not Str2

Is thread time spent in synchronization too high?

大憨熊 提交于 2019-12-01 04:42:19
Today I profiled one of my C# applications using the Visual Studio 2010 Performance Analyzer. Specifically, I was profiling for " Concurrency " because it seemed as though my app should have more capacity then it was demonstrating. The analysis report showed that the threads were spending ~70-80% of their time in a Synchronization state. To be honest, I'm not sure what this means. Does this mean that the application is suffering from a live-lock condition? For context... there are ~30+ long-running threads bound to a single AppDomain ( if that matters ) and some of the threads are very busy