compiler-optimization | 易学教程

Will 30 GOTO 10 always go to 10?

阅读更多关于 Will 30 GOTO 10 always go to 10?

问题 In the spirit of the latest podcast where Joel mentioned he'd like some simple questions with possibly interesting answers ... In the environments we have to programme in today we can't rely on the order of execution of our langauage statements. Is that true? Should we be concerned? Will 30 GOTO 10 always go to 10?* *I didn't use 20 on purpose ;) [edit] for the four people voting for closure of this question ... "Runtime compilers use profiling information to help optimize the code being

Hotspot JIT optimizations

阅读更多关于 Hotspot JIT optimizations

问题 In a lecture about JIT in Hotspot I want to give as many examples as possible of the specific optimizations that JIT performs. I know just about "method inlining", but there should be much more. Give a vote for every example. 回答1: Well, you should scan Brian Goetz's articles for examples. In brief, HotSpot can and will: Inline methods Join adjacent synchronized blocks on the same object Eliminate locks if monitor is not reachable from other threads Eliminate dead code (hence most of micro

Do any C or C++ compilers optimize within define macros?

阅读更多关于 Do any C or C++ compilers optimize within define macros?

问题 Let's say I have the following in C or C++: #include <math.h> #define ROWS 15 #define COLS 16 #define COEFF 0.15 #define NODES (ROWS*COLS) #define A_CONSTANT (COEFF*(sqrt(NODES))) Then, I go and use NODES and A_CONSTANT somewhere deep within many nested loops (i.e. used many times). Clearly, both have numeric values that can be ascertained at compile-time, but do compilers actually do it? At run-time, will the CPU have to evaluate 15*16 every time it sees NODES , or will the compiler

Is this incorrect code generation with arrays of __m256 values a clang bug?

阅读更多关于 Is this incorrect code generation with arrays of __m256 values a clang bug?

问题 I'm encountering what appears to be a bug causing incorrect code generation with clang 3.4, 3.5, and 3.6 trunk. The source that actually triggered the problem is quite complicated, but I've been able to reduce it to this self-contained example: #include <iostream> #include <immintrin.h> #include <string.h> struct simd_pack { enum { num_vectors = 1 }; __m256i _val[num_vectors]; }; simd_pack load_broken(int8_t *p) { simd_pack pack; for (int i = 0; i < simd_pack::num_vectors; ++i) pack._val[i] =

gcc: is there no tail recursion if I return std::string in C++?

阅读更多关于 gcc: is there no tail recursion if I return std::string in C++?

问题 As per my answer in Write a recursive function that reverses the input string, I've tried seeing whether clang++ -O3 or g++ -O3 would make a tail-recursion optimisation, using some of the suggestions from How do I check if gcc is performing tail-recursion optimization?, but it doesn't look like any tail recursion optimisation is taking place. Any idea why? Does this have to do with the way C++ objects are created and destroyed? Is there any way to make it work? The programme: % cat t2.cpp

adding “-march=native” intel compiler flag to the compilation line leads to a floating point exception on KNL

阅读更多关于 adding “-march=native” intel compiler flag to the compilation line leads to a floating point exception on KNL

问题 I have a code, which i launch on Intel Xeon Phi Knights Landing (KNL) 7210 (64 cores) processor (it is a PC, in native mode) and use the Intel c++ compiler (icpc) version 17.0.4. Also i launch the same code on Intel core i7 processor, where the version of icpc is 17.0.1. To be more correct, i compile the code on the machine i'm launching it (compiled on i7 and launched on i7, the same for KNL). I never make the binary file on one machine and bring it to another. The loops are parallelized and

Can java inline a large method if the most of it would be dead code at the call site?

阅读更多关于 Can java inline a large method if the most of it would be dead code at the call site?

问题 I know that one of the criteria that Java HotSpot uses to decide whether a method is worth inlining is how large it the method is. On one hand, this seems sensible: if the method is large, in-lining leads to code bloat and the method would take so long to execute that the call overhead is trivial. The trouble with this logic is that it might turn out that AFTER you decide to inline, it becomes clear that for this particular call-site, most of the method is dead code. For instance, the method

Where can I modify detailed C# compiler optimization settings in Visual Studio?

阅读更多关于 Where can I modify detailed C# compiler optimization settings in Visual Studio?

问题 In Visual Studio C/C++ projects, it's easy to modify compiler's optimization settings in "Property Pages | C/C++ | Optimization". For example, we may give different optimization levels such as /O2 and /O3, as well as advanced optimizations like "Omit Frame Pointers". However, I can't simply find corresponding UIs in C# project of Visual Studio. All I can find is just turning off optimizations: the "Optimize code" check box is all I've got. Can C# users control detailed compiler's

Why don't LLVM passes optimize floating point instructions? [duplicate]

阅读更多关于 Why don't LLVM passes optimize floating point instructions? [duplicate]

问题 This question already has answers here : Why doesn't GCC optimize a*a*a*a*a*a to (a*a*a)*(a*a*a)? (12 answers) Closed 6 years ago . See above. I wrote to sample functions: source.ll: define i32 @bleh(i32 %x) { entry: %addtmp = add i32 %x, %x %addtmp1 = add i32 %addtmp, %x %addtmp2 = add i32 %addtmp1, %x %addtmp3 = add i32 %addtmp2, %x %addtmp4 = add i32 %addtmp3, 1 %addtmp5 = add i32 %addtmp4, 2 %addtmp6 = add i32 %addtmp5, 3 %multmp = mul i32 %x, 3 %addtmp7 = add i32 %addtmp6, %multmp ret

why not allow common subexpression elimination on const nonvolatile member functions?

阅读更多关于 why not allow common subexpression elimination on const nonvolatile member functions?

问题 One of the goals of C++ is to allow user-defined types to behave as nicely as built-in types. One place where this seems to fail is in compiler optimization. If we assume that a const nonvolatile member function is the moral equivalent of a read (for a user-defined type), then why not allow a compiler to eliminate repeated calls to such a function? For example class C { ... public: int get() const; } int main() { C c; int x{c.get()}; x = c.get(); // why not allow the compiler to eliminate