compiler-optimization | 易学教程

What x86 32 bit peepholes does GCC perform?

阅读更多关于 What x86 32 bit peepholes does GCC perform?

问题 I've been browsing through the GCC source code and I've been stumped on how to extract these. Can anyone provide a list or information on how to extract these peepholes (assembly rewrite optimizations)? GCC code: https://github.com/gcc-mirror/gcc Edit : To clarify, a "peephole" is defined to be a find and replace pattern with some associated side conditions for the rewrite to be valid (often just some register/flags liveness information). 回答1: Look in the various *.md files and search for

Why does a const int not get optimized by the compiler (through the symbol table) if another pointer points to its reference?

阅读更多关于 Why does a const int not get optimized by the compiler (through the symbol table) if another pointer points to its reference?

问题 This is a follow up on an answer in this question: What kind of optimization does const offer in C/C++? (if any) In the top voted answer, the following is stated: When you declare a const in your program, int const x = 2; Compiler can optimize away this const by not providing storage to this variable rather add it in symbol table. So, subsequent read just need indirection into the symbol table rather than instructions to fetch value from memory. NOTE:- If you do something like below:- const

-O2 in ICC messes up assembler, fine with -O1 in ICC and all optimizations in GCC / Clang

阅读更多关于 -O2 in ICC messes up assembler, fine with -O1 in ICC and all optimizations in GCC / Clang

问题 I was recently starting to use ICC (18.0.1.126) to compile a code that worked fine with GCC and Clang on arbitrary optimization settings. The code contains an assembler routine that multiplies 4x4 matrices of doubles using AVX2 and FMA instructions. After much fiddling it turned out that the assembler routine is working properly when compiled with -O1 - xcore-avx2, but gives a wrong numerical result when compiled with -O2 - xcore-avx2. The code compiles however without any error messages on

Julia Method Error converting Complex{Float64}

阅读更多关于 Julia Method Error converting Complex{Float64}

问题 I'm novice to Julia and I have the following code with this error: MethodError(convert,(Complex{Float64},[-1.0 - 1.0im])) . I would like to know the source of the error and how to optimize this piece of code for speed. This is my code: function OfdmSym() N = 64 n = 1000 symbol = convert(Array{Complex{Float64},2},ones(n,64)) # I need Array{Complex{Float64},2} data = convert(Array{Complex{Float64},2},ones(1,48)) # I need Array{Complex{Float64},2} const unused = convert(Array{Complex{Float64},2}

No deadlock unless linked to pthreads?

阅读更多关于 No deadlock unless linked to pthreads?

问题 Why is it that creating a std::mutex deadlock will not actually cause a deadlock unless the program is linked to pthreads? The following will deadlock when linked with pthreads library and will not deadlock if pthreads is not linked in. Tested on gcc and clang. // clang++ main.cpp -std=c++14 -lpthread #include <mutex> int main() { std::mutex mtx; mtx.lock(); mtx.lock(); return 0; } I understand that without a thread library you don't actually need mutex functionality, but is the compiler

How to set ICC attribute “fp-model precise” for a single function, to prevent associative optimizations?

阅读更多关于 How to set ICC attribute “fp-model precise” for a single function, to prevent associative optimizations?

问题 I am implementing Kahan summation, in a project that supports compilation with gcc47, gcc48, clang33, icc13, and icc14. As part of this algorithm, I would like to disable optimizations that take advantage of the associativity of addition of real numbers. (Floating point operations are not associative.) I would like to disable those optimizations only in the relevant function . I have figured out how to do this under gcc, using the ''no-associative-math'' attribute. How can I do this in icc or

Why can't clang and gcc optimize away this int-to-float conversion?

阅读更多关于 Why can't clang and gcc optimize away this int-to-float conversion?

问题 Consider the following code: void foo(float* __restrict__ a) { int i; float val; for (i = 0; i < 100; i++) { val = 2 * i; a[i] = val; } } void bar(float* __restrict__ a) { int i; float val = 0.0; for (i = 0; i < 100; i++) { a[i] = val; val += 2.0; } } They're based on Examples 7.26a and 7.26b in Agner Fog's Optimizing software in C++ and should do the same thing; bar is more "efficient" as written in the sense that we don't do an integer-to-float conversion at every iteration, but rather a

Saving memory and compile time

阅读更多关于 Saving memory and compile time

问题 Is there any way to save memory and compile time in perl using modules? For example not load all of the unneccessary, unused subs? Or it is a good way if I split my subs to many different pm files, and then I load only neccessary modules? For example: #!/usr/bin/perl -w sub mysub1() { use MySubsGroup1; } sub mysub2() { use MySubsGroup2; } This solution use less memory and get less compile time? Or what is the best practice to load only neccessary functions? 回答1: From perldoc autouse autouse -

Loop is not vectorized when variable extent is used

阅读更多关于 Loop is not vectorized when variable extent is used

问题 Version A code is not vectorized while version B code is vectorized. How to make version A vectorize and keep the variable extents (without using literal extents)? The nested loop is for multiplication with broadcasting as in numpy library of python and matlab. Description of broadcasting in numpy library is here. Version A code (no std::vector. no vectorization.) This only uses imull (%rsi), %edx in .L169 , which is not a SIMD instruction. gcc godbolt #include <iostream> #include <stdint.h>

Can modern compilers optimize constant expressions where the expression is derived from a function?

阅读更多关于 Can modern compilers optimize constant expressions where the expression is derived from a function?

问题 It is my understanding that modern c++ compilers take shortcuts on things like: if(true) {do stuff} But how about something like: bool foo(){return true} ... if(foo()) {do stuff} Or: class Functor { public: bool operator() () { return true;} } ... Functor f; if(f()){do stuff} 回答1: It depends if the compiler can see foo() in the same compilation unit. With optimization enabled, if foo() is in the same compilation unit as the callers, it will probably inline the call to foo() and then