compiler-optimization

Why do common C compilers include the source filename in the output?

穿精又带淫゛_ 提交于 2019-12-04 02:41:39
I have learnt from this recent answer that gcc and clang include the source filename somewhere in the binary as metadata, even when debugging is not enabled. I can't really understand why this should be a good idea. Besides the tiny privacy risks, this happens also when one optimizes for the size of the resulting binary ( -Os ), which looks inefficient. Why do the compilers include this information? cyphar The reason why GCC includes the filename is mainly for debugging purposes, because it allows a programmer to identify from which source file a given symbol comes from as (tersely) outlined

.Max() vs OrderByDescending().First()

只谈情不闲聊 提交于 2019-12-04 02:17:28
This is purely for my own knowledge, if I were going to write the code I would just use .Max() . At first thought .Max() only has to do a single pass through numbers to find the max, while the second way has to sort the entire thing enumerable then find the first one. So it's O(n) vs O(n lg n) . But then I was thinking maybe it knows it only needs the highest and just grabs it. Question: Is LINQ and/or the compiler smart enough to figure out that it doesn't need to sort the entire enumerable and boils the code down to essentially the same as .Max()? Is there a quantifiable way to find out?

Is it guaranteed that Complex Float variables will be 8-byte aligned in memory?

末鹿安然 提交于 2019-12-04 02:04:13
问题 In C99 the new complex types were defined. I am trying to understand whether a compiler can take advantage of this knowledge in optimizing memory accesses. Are these objects ( A - F ) of type complex float guaranteed to be 8-byte aligned in memory? #include "complex.h" typedef complex float cfloat; cfloat A; cfloat B[10]; void func(cfloat C, cfloat *D) { cfloat E; cfloat F[10]; } Note that for D , the question relates to the object pointed to by D , not to the pointer storage itself. And, if

Standard C++11 code equivalent to the PEXT Haswell instruction (and likely to be optimized by compiler)

拟墨画扇 提交于 2019-12-04 01:15:39
问题 The Haswell architectures comes up with several new instructions. One of them is PEXT (parallel bits extract) whose functionality is explained by this image (source here): It takes a value r2 and a mask r3 and puts the extracted bits of r2 into r1 . My question is the following: what would be the equivalent code of an optimized templated function in pure standard C++11, that would be likely to be optimized to this instruction by compilers in the future. 回答1: Here is some code from Matthew

Are C# anonymous types redundant in C# 7

北城以北 提交于 2019-12-04 00:56:02
Since C# 7 introduces value tuples, is there a meaningful scenario where they are better suited than tuples? For example, the following line collection.Select((x, i) => (x, i)).Where(y => arr[y.i].f(y.x)).ToArray(); makes the following line collection.Select((x, i) => new {x, i}).Where(y => arr[y.i].f(y.x)).ToArray(); redundant. What would be the use case where one is better used over the other (for either performance reasons or optimization)? Obviously, if there is a need for more than six fields, tuples cannot be used, but is there something a bit more nuanced to it? There are various

Crash in C++ code due to undefined behaviour or compiler bug?

余生颓废 提交于 2019-12-04 00:51:56
I am experiencing strange crashes. And I wonder whether it is a bug in my code, or the compiler. When I compile the following C++ code with Microsoft Visual Studio 2010 as an optimized release build, it crashes in the marked line: struct tup { int x; int y; }; class C { public: struct tup* p; struct tup* operator--() { return --p; } struct tup* operator++(int) { return p++; } virtual void Reset() { p = 0;} }; int main () { C c; volatile int x = 0; struct tup v1; struct tup v2 = {0, x}; c.p = &v1; (*(c++)) = v2; struct tup i = (*(--c)); // crash! (dereferencing a NULL-pointer) return i.x; }

LLVM and the future of optimization

自闭症网瘾萝莉.ら 提交于 2019-12-03 23:45:57
I realize that LLVM has a long way to go, but theoretically, can the optimizations that are in GCC/ICC/etc. for individual languages be applied to LLVM byte code? If so, does this mean that any language that compiles to LLVM byte code has the potential to be equally as fast? Or are language specific optimizations (before the LLVM bytecode stage) going to always play a large part in optimizing any specific program. I don't know much about compilers or optimizations (only enough to be dangerous), so I apologize if this question isn't well defined. In general, no. For example, in Haskell a common

effect of goto on C++ compiler optimization

 ̄綄美尐妖づ 提交于 2019-12-03 22:27:13
What are the performance benefits or penalties of using goto with a modern C++ compiler? I am writing a C++ code generator and use of goto will make it easier to write. No one will touch the resulting C++ files so don't get all "goto is bad" on me . As a benefit, they save the use of temporary variables. I was wondering, from a purely compiler optimization perspective, the result that goto has on the compiler's optimizer? Does it make code faster , slower , or generally no change in performance compared to using temporaries / flags. The part of a compiler that would be affected works with a

Why doesn't GCC optimize a*a*a*a*a*a to (a*a*a)*(a*a*a)?

好久不见. 提交于 2019-12-03 21:53:35
I am doing some numerical optimization on a scientific application. One thing I noticed is that GCC will optimize the call pow(a,2) by compiling it into a*a , but the call pow(a,6) is not optimized and will actually call the library function pow , which greatly slows down the performance. (In contrast, Intel C++ Compiler , executable icc , will eliminate the library call for pow(a,6) .) What I am curious about is that when I replaced pow(a,6) with a*a*a*a*a*a using GCC 4.5.1 and options " -O3 -lm -funroll-loops -msse4 ", it uses 5 mulsd instructions: movapd %xmm14, %xmm13 mulsd %xmm14, %xmm13

Will 30 GOTO 10 always go to 10?

半腔热情 提交于 2019-12-03 21:32:29
In the spirit of the latest podcast where Joel mentioned he'd like some simple questions with possibly interesting answers ... In the environments we have to programme in today we can't rely on the order of execution of our langauage statements. Is that true? Should we be concerned? Will 30 GOTO 10 always go to 10?* *I didn't use 20 on purpose ;) [edit] for the four people voting for closure of this question ... "Runtime compilers use profiling information to help optimize the code being compiled. The JVM is permitted to use information specific to the execution in order to produce better code