compiler-optimization | 易学教程

Why do common C compilers include the source filename in the output?

阅读更多关于 Why do common C compilers include the source filename in the output?

I have learnt from this recent answer that gcc and clang include the source filename somewhere in the binary as metadata, even when debugging is not enabled. I can't really understand why this should be a good idea. Besides the tiny privacy risks, this happens also when one optimizes for the size of the resulting binary ( -Os ), which looks inefficient. Why do the compilers include this information? cyphar The reason why GCC includes the filename is mainly for debugging purposes, because it allows a programmer to identify from which source file a given symbol comes from as (tersely) outlined

.Max() vs OrderByDescending().First()

阅读更多关于 .Max() vs OrderByDescending().First()

This is purely for my own knowledge, if I were going to write the code I would just use .Max() . At first thought .Max() only has to do a single pass through numbers to find the max, while the second way has to sort the entire thing enumerable then find the first one. So it's O(n) vs O(n lg n) . But then I was thinking maybe it knows it only needs the highest and just grabs it. Question: Is LINQ and/or the compiler smart enough to figure out that it doesn't need to sort the entire enumerable and boils the code down to essentially the same as .Max()? Is there a quantifiable way to find out?

Is it guaranteed that Complex Float variables will be 8-byte aligned in memory?

阅读更多关于 Is it guaranteed that Complex Float variables will be 8-byte aligned in memory?

问题 In C99 the new complex types were defined. I am trying to understand whether a compiler can take advantage of this knowledge in optimizing memory accesses. Are these objects ( A - F ) of type complex float guaranteed to be 8-byte aligned in memory? #include "complex.h" typedef complex float cfloat; cfloat A; cfloat B[10]; void func(cfloat C, cfloat *D) { cfloat E; cfloat F[10]; } Note that for D , the question relates to the object pointed to by D , not to the pointer storage itself. And, if

Standard C++11 code equivalent to the PEXT Haswell instruction (and likely to be optimized by compiler)

阅读更多关于 Standard C++11 code equivalent to the PEXT Haswell instruction (and likely to be optimized by compiler)

问题 The Haswell architectures comes up with several new instructions. One of them is PEXT (parallel bits extract) whose functionality is explained by this image (source here): It takes a value r2 and a mask r3 and puts the extracted bits of r2 into r1 . My question is the following: what would be the equivalent code of an optimized templated function in pure standard C++11, that would be likely to be optimized to this instruction by compilers in the future. 回答1: Here is some code from Matthew

Are C# anonymous types redundant in C# 7

阅读更多关于 Are C# anonymous types redundant in C# 7

Since C# 7 introduces value tuples, is there a meaningful scenario where they are better suited than tuples? For example, the following line collection.Select((x, i) => (x, i)).Where(y => arr[y.i].f(y.x)).ToArray(); makes the following line collection.Select((x, i) => new {x, i}).Where(y => arr[y.i].f(y.x)).ToArray(); redundant. What would be the use case where one is better used over the other (for either performance reasons or optimization)? Obviously, if there is a need for more than six fields, tuples cannot be used, but is there something a bit more nuanced to it? There are various

Crash in C++ code due to undefined behaviour or compiler bug?

阅读更多关于 Crash in C++ code due to undefined behaviour or compiler bug?

I am experiencing strange crashes. And I wonder whether it is a bug in my code, or the compiler. When I compile the following C++ code with Microsoft Visual Studio 2010 as an optimized release build, it crashes in the marked line: struct tup { int x; int y; }; class C { public: struct tup* p; struct tup* operator--() { return --p; } struct tup* operator++(int) { return p++; } virtual void Reset() { p = 0;} }; int main () { C c; volatile int x = 0; struct tup v1; struct tup v2 = {0, x}; c.p = &v1; (*(c++)) = v2; struct tup i = (*(--c)); // crash! (dereferencing a NULL-pointer) return i.x; }

LLVM and the future of optimization

阅读更多关于 LLVM and the future of optimization

I realize that LLVM has a long way to go, but theoretically, can the optimizations that are in GCC/ICC/etc. for individual languages be applied to LLVM byte code? If so, does this mean that any language that compiles to LLVM byte code has the potential to be equally as fast? Or are language specific optimizations (before the LLVM bytecode stage) going to always play a large part in optimizing any specific program. I don't know much about compilers or optimizations (only enough to be dangerous), so I apologize if this question isn't well defined. In general, no. For example, in Haskell a common

effect of goto on C++ compiler optimization

阅读更多关于 effect of goto on C++ compiler optimization

What are the performance benefits or penalties of using goto with a modern C++ compiler? I am writing a C++ code generator and use of goto will make it easier to write. No one will touch the resulting C++ files so don't get all "goto is bad" on me . As a benefit, they save the use of temporary variables. I was wondering, from a purely compiler optimization perspective, the result that goto has on the compiler's optimizer? Does it make code faster , slower , or generally no change in performance compared to using temporaries / flags. The part of a compiler that would be affected works with a

Why doesn't GCC optimize aaaaaa to (aaa)(aaa)?

阅读更多关于 Why doesn't GCC optimize a*a*a*a*a*a to (a*a*a)*(a*a*a)?

I am doing some numerical optimization on a scientific application. One thing I noticed is that GCC will optimize the call pow(a,2) by compiling it into a*a , but the call pow(a,6) is not optimized and will actually call the library function pow , which greatly slows down the performance. (In contrast, Intel C++ Compiler , executable icc , will eliminate the library call for pow(a,6) .) What I am curious about is that when I replaced pow(a,6) with a*a*a*a*a*a using GCC 4.5.1 and options " -O3 -lm -funroll-loops -msse4 ", it uses 5 mulsd instructions: movapd %xmm14, %xmm13 mulsd %xmm14, %xmm13

Will 30 GOTO 10 always go to 10?

阅读更多关于 Will 30 GOTO 10 always go to 10?

In the spirit of the latest podcast where Joel mentioned he'd like some simple questions with possibly interesting answers ... In the environments we have to programme in today we can't rely on the order of execution of our langauage statements. Is that true? Should we be concerned? Will 30 GOTO 10 always go to 10?* *I didn't use 20 on purpose ;) [edit] for the four people voting for closure of this question ... "Runtime compilers use profiling information to help optimize the code being compiled. The JVM is permitted to use information specific to the execution in order to produce better code