compiler-optimization | 易学教程

Function Template Overload Resolution & Compiler Optimizations

阅读更多关于 Function Template Overload Resolution & Compiler Optimizations

问题 I was looking at this question found here Template function overload for type containing a type Where the OP user2079802 provided this code for his/her question: I'm trying to do the following: #include <iostream> #include <vector> #include <tuple> template <typename T> void f(T t) { std::cout << "1" << std::endl; } template <typename T, typename V> void f(T<std::tuple<V>> t) { std::cout << "2" << std::endl; } int main() { f(std::list<double>{}); // should use first template f(std::vector<std

Computation is optimized only if variable updated in loop is local

阅读更多关于 Computation is optimized only if variable updated in loop is local

问题 For the following function, the code with optimizations is vectorized and the computation is performed in registers (the return value is returned in eax ). Generated machine code is, e.g., here: https://godbolt.org/z/VQEBV4. int sum(int *arr, int n) { int ret = 0; for (int i = 0; i < n; i++) ret += arr[i]; return ret; } However, if I make ret variable global (or, a parameter of type int& ), the vectorization is not used and the compiler stores the updated ret in each iteration to memory.

passing rvalue to non-ref parameter, why can't the compiler elide the copy?

阅读更多关于 passing rvalue to non-ref parameter, why can't the compiler elide the copy?

问题 struct Big { int a[8]; }; void foo(Big a); Big getStuff(); void test1() { foo(getStuff()); } compiles (using clang 6.0.0 for x86_64 on Linux so System V ABI, flags: -O3 -march=broadwell ) to test1(): # @test1() sub rsp, 72 lea rdi, [rsp + 40] call getStuff() vmovups ymm0, ymmword ptr [rsp + 40] vmovups ymmword ptr [rsp], ymm0 vzeroupper call foo(Big) add rsp, 72 ret If I am reading this correctly, this is what is happening: getStuff is passed a pointer to foo 's stack ( rsp + 40 ) to use for

Why isn't string concatenation automatically converted to StringBuilder in C#? [duplicate]

阅读更多关于 Why isn't string concatenation automatically converted to StringBuilder in C#? [duplicate]

问题 This question already has answers here : Closed 7 years ago . Possible Duplicate: Why is String.Concat not optimized to StringBuilder.Append? One day I was ranting about a particular Telerik control to a friend of mine. I told him that it took several seconds to generate a controls tree, and after profiling I found out that it is using a string concatenation in a loop instead of a StringBuilder. After rewriting it worked almost instantaneously. So my friend heard that and seemed to be

effect of goto on C++ compiler optimization

阅读更多关于 effect of goto on C++ compiler optimization

问题 What are the performance benefits or penalties of using goto with a modern C++ compiler? I am writing a C++ code generator and use of goto will make it easier to write. No one will touch the resulting C++ files so don't get all "goto is bad" on me . As a benefit, they save the use of temporary variables. I was wondering, from a purely compiler optimization perspective, the result that goto has on the compiler's optimizer? Does it make code faster , slower , or generally no change in

What is my compiler doing? (optimizing memcpy)

阅读更多关于 What is my compiler doing? (optimizing memcpy)

问题 I'm compiling a bit of code using the following settings in VC++2010: /O2 /Ob2 /Oi /Ot However I'm having some trouble understanding some parts of the assembly generated, I have put some questions in the code as comments. Also, what prefetching distance is generally recommended on modern cpus? I can ofc test on my own cpu, but I was hoping for some value that will work well on a wider range of cpus. Maybe one could use dynamic prefetching distances? <--EDIT: Another thing I'm surprised about

Avoid stalling pipeline by calculating conditional early

阅读更多关于 Avoid stalling pipeline by calculating conditional early

问题 When talking about the performance of ifs, we usually talk about how mispredictions can stall the pipeline. The recommended solutions I see are: Trust the branch predictor for conditions that usually have one result; or Avoid branching with a little bit of bit-magic if reasonably possible; or Conditional moves where possible. What I couldn't find was whether or not we can calculate the condition early to help where possible. So, instead of: ... work if (a > b) { ... more work } Do something

How to tell that common subexpression elimination is happening or not in GHC?

阅读更多关于 How to tell that common subexpression elimination is happening or not in GHC?

问题 Let's say I have a naively implemented function like this: quadratic a b c = (ans1, ans2) where ans1 = ((-b) + sqrt (b * b - 4 * a * c)) / (2 * a) ans2 = ((-b) - sqrt (b * b - 4 * a * c)) / (2 * a) There are multiple identical subexpressions. How can I tell without reading core that common subexpression elimination is happening or not and to which parts of this? 回答1: Using trace might tell you as demonstrated in this SO question. import Debug.Trace quadratic a b c = (ans1, ans2) where ans1 =

How to increase stack size in Bloodshed Dev-C++?

阅读更多关于 How to increase stack size in Bloodshed Dev-C++?

问题 We are using Blodshed Dev-C++ to in an image processing project. We are implementing connected component labelling on a video frame. We have to use a recursive function which recurses so many times that we get a stackoverflow. How can we have a larger stack size? Is it possible to change it through some linker parameters or anything similar? void componentLabel(int i,int j,IplImage *img){ // blueFrame = img->imageData[i*3*width+j*3]; // greenFrame = img->imageData[i*3*width+j*3+1]; //

greenhills compiler turn off optimization for file or part of

阅读更多关于 greenhills compiler turn off optimization for file or part of

问题 I found several code snippets for disabling GCC optimization for dedicated code parts. with pragma GCC optimize(0) But I could not found something like that for Greenhils compiler. Is there no such option? 回答1: From the manual: #pragma ghs Ostring Turns on optimizations. The optional string may contain any or all of the following letters: L — Loop optimizations M — Memory optimizations S — Small (but Slow) optimizations #pragma ghs ZO Disables all optimizations, starting from the next