micro-optimization | 易学教程

Is there a tool to test the conciseness of c program? [closed]

阅读更多关于 Is there a tool to test the conciseness of c program? [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 3 years ago . For example I want to check whether the following code can be more concise or not: for(i = 0; i < map->size; i++){ if(0 < map->bucket[i].n){ p = map->bucket[i].list; while(p){ h = hash(p->key) % n; if(bucket[h].list){ new_p = bucket[h].list; while(new_p->next)new_p = new_p->next; new_p->next = p; next = p->next;

How to force GCC to assume that a floating-point expression is non-negative?

阅读更多关于 How to force GCC to assume that a floating-point expression is non-negative?

There are cases where you know that a certain floating-point expression will always be non-negative. For example, when computing the length of a vector, one does sqrt(a[0]*a[0] + ... + a[N-1]*a[N-1]) (NB: I am aware of std::hypot , this is not relevant to the question), and the expression under the square root is clearly non-negative. However, GCC outputs the following assembly for sqrt(x*x) : mulss xmm0, xmm0 pxor xmm1, xmm1 ucomiss xmm1, xmm0 ja .L10 sqrtss xmm0, xmm0 ret .L10: jmp sqrtf That is, it compares the result of x*x to zero, and if the result is non-negative, it does the sqrtss

Why does Intel's compiler prefer NEG+ADD over SUB?

阅读更多关于 Why does Intel's compiler prefer NEG+ADD over SUB?

In examining the output of various compilers for a variety of code snippets, I've noticed that Intel's C compiler (ICC) has a strong tendency to prefer emitting a pair of NEG + ADD instructions where other compilers would use a single SUB instruction. As a simple example, consider the following C code: uint64_t Mod3(uint64_t value) { return (value % 3); } ICC translates this to the following machine code (regardless of optimization level): mov rcx, 0xaaaaaaaaaaaaaaab mov rax, rdi mul rcx shr rdx, 1 lea rsi, QWORD PTR [rdx+rdx*2] neg rsi ; \ equivalent to: add rdi, rsi ; / sub rdi, rsi mov rax,

Why does n++ execute faster than n=n+1?

阅读更多关于 Why does n++ execute faster than n=n+1?

In C language, Why does n++ execute faster than n=n+1 ? (int n=...; n++;) (int n=...; n=n+1;) Our instructor asked that question in today's class. (this is not homework) Betamoo That would be true if you are working on a "stone-age" compiler... In case of "stone-age" : ++n is faster than n++ is faster than n=n+1 Machine usually have increment x as well as add const to x In case of n++ , you will have 2 memory access only (read n, inc n, write n ) In case of n=n+1 , you will have 3 memory access (read n, read const, add n and const, write n) But today's compiler will automatically convert n=n+1

Multiplication with constant - imul or shl-add-combination

阅读更多关于 Multiplication with constant - imul or shl-add-combination

This question is about how we multiply an integer with a constant. So let's look at a simple function: int f(int x) { return 10*x; } How can that function be optimized best, especially when inlined into a caller? Approach 1 (produced by most optimizing compilers (e.g. on Godbolt )) lea (%rdi,%rdi,4), %eax add %eax, %eax Approach 2 (produced with clang3.6 and earlier, with -O3) imul $10, %edi, %eax Approach 3 (produced with g++6.2 without optimization, removing stores/reloads) mov %edi, %eax sal $2, %eax add %edi, %eax add %eax, %eax Which version is fastest, and why? Primarily interested in

How much memory instance of my class uses - pragmatic answer

阅读更多关于 How much memory instance of my class uses - pragmatic answer

问题 How big is instance of following class after constructor is called? I guess this can be written generally as size = nx + c, where x = 4 in x86, and x = 8 in x64. n = ? c = ? Is there some method in .NET which can return this number? class Node { byte[][] a; int[] b; List<Node> c; public Node() { a = new byte[3][]; b = new int[3]; c = new List<Node>(0); } } 回答1: First of all this depends on environment where this program is compiled and run, but if you fix some variables you can get pretty

Branch on ?: operator?

阅读更多关于 Branch on ?: operator?

For a typical modern compiler on modern hardware, will the ? : operator result in a branch that affects the instruction pipeline? In other words which is faster, calling both cases to avoid a possible branch: bool testVar = someValue(); // Used later. purge(white); purge(black); or picking the one actually needed to be purged and only doing it with an operator ?: : bool testVar = someValue(); purge(testVar ? white : black); I realize you have no idea how long purge() will take, but I'm just asking a general question here about whether I would ever want to call purge() twice to avoid a possible

How much faster are SSE4.2 string instructions than SSE2 for memcmp?

阅读更多关于 How much faster are SSE4.2 string instructions than SSE2 for memcmp?

Here is my code's assembler Can you embed it in c ++ and check against SSE4? At speed I would very much like to see how stepped into the development of SSE4. Or is not worried about him at all? Let's check (I do not have support above SSSE3) { sse2 strcmp WideChar 32 bit } function CmpSee2(const P1, P2: Pointer; len: Integer): Boolean; asm push ebx // Create ebx cmp EAX, EDX // Str = Str2 je @@true // to exit true test eax, eax // not Str je @@false // to exit false test edx, edx // not Str2 je @@false // to exit false sub edx, eax // Str2 := Str2 - Str; mov ebx, [eax] // get Str 4 byte xor

How much faster are SSE4.2 string instructions than SSE2 for memcmp?

阅读更多关于 How much faster are SSE4.2 string instructions than SSE2 for memcmp?

问题 Here is my code's assembler Can you embed it in c ++ and check against SSE4? At speed I would very much like to see how stepped into the development of SSE4. Or is not worried about him at all? Let's check (I do not have support above SSSE3) { sse2 strcmp WideChar 32 bit } function CmpSee2(const P1, P2: Pointer; len: Integer): Boolean; asm push ebx // Create ebx cmp EAX, EDX // Str = Str2 je @@true // to exit true test eax, eax // not Str je @@false // to exit false test edx, edx // not Str2

Is thread time spent in synchronization too high?

阅读更多关于 Is thread time spent in synchronization too high?

Today I profiled one of my C# applications using the Visual Studio 2010 Performance Analyzer. Specifically, I was profiling for " Concurrency " because it seemed as though my app should have more capacity then it was demonstrating. The analysis report showed that the threads were spending ~70-80% of their time in a Synchronization state. To be honest, I'm not sure what this means. Does this mean that the application is suffering from a live-lock condition? For context... there are ~30+ long-running threads bound to a single AppDomain ( if that matters ) and some of the threads are very busy