compiler-optimization | 易学教程

Why gcc autovectorization does not work on convolution matrix biger than 3x3?

阅读更多关于 Why gcc autovectorization does not work on convolution matrix biger than 3x3?

问题 I've implemented the following program for convolution matrix #include <stdio.h> #include <time.h> #define NUM_LOOP 1000 #define N 128 //input or output dimention 1 #define M N //input or output dimention 2 #define P 5 //convolution matrix dimention 1 if you want a 3x3 convolution matrix it must be 3 #define Q P //convolution matrix dimention 2 #define Csize P*Q #define Cdiv 1 //div for filter #define Coffset 0 //offset //functions void unusual(); //unusual implementation of convolution void

Why can't gcc devirtualize this function call?

阅读更多关于 Why can't gcc devirtualize this function call?

问题 #include <cstdio> #include <cstdlib> struct Interface { virtual void f() = 0; }; struct Impl1: Interface { void f() override { std::puts("foo"); } }; // or __attribute__ ((visibility ("hidden")))/anonymous namespace static Interface* const ptr = new Impl1 ; int main() { ptr->f(); } When compiled with g++-7 -O3 -flto -fdevirtualize-at-ltrans -fipa-pta -fuse-linker-plugin , the above ptr->f() call cannot be devirtualized. It seems that no external library can modify ptr . Is this a deficiency

Is GCC's option -O2 breaking this small program or do I have undefined behavior [duplicate]

阅读更多关于 Is GCC's option -O2 breaking this small program or do I have undefined behavior [duplicate]

问题 This question already has answers here : Decrementing a pointer out of bounds; incrementing it into bounds [duplicate] (3 answers) Why is out-of-bounds pointer arithmetic undefined behaviour? (7 answers) Closed 5 years ago . I found this problem in a very large application, have made an SSCCE from it. I don't know whether the code has undefined behavior or -O2 breaks it. When compiling it with gcc a.c -o a.exe -O2 -Wall -Wextra -Werror it prints 5 . But it prints 25 when compiling without -O2

g++ compiler flag to minimize binary size

阅读更多关于 g++ compiler flag to minimize binary size

问题 I'm have an Arduino Uno R3. I'm making logical objects for each of my sensors using C++. The Arduino has very limited on-board memory 32KB*, and, on average, my compiled objects are coming out around 6KB*. I am already using the smallest possible data types required, in an attempt to minimize my memory footprint. Is there a compiler flag to minimize the size of the binary, or do I need to use shorter variable and function names, less functions, etc. to minimize my code base? Also, any other

Return value optimizations and side-effects

阅读更多关于 Return value optimizations and side-effects

问题 Return value optimization (RVO) is an optimization technique involving copy elision, which eliminates the temporary object created to hold a function's return value in certain situations. I understand the benefit of RVO in general, but I have a couple of questions. The standard says the following about it in §12.8, paragraph 32 of this working draft (emphasis mine). When certain criteria are met, an implementation is allowed to omit the copy/move construction of a class object, even if the

Bug only occurring when compile optimization enabled

阅读更多关于 Bug only occurring when compile optimization enabled

问题 I came across a bug in code that is only reproduced when the code is built with optimizations enabled. I've made a console app that replicates the logic for testing (code below). You'll see that when optimization is enabled 'value' becomes null after execution of this invalid logic: if ((value == null || value == new string[0]) == false) The fix is straight forward and is commented out below the offending code. But... I'm more concerned that I may have come across a bug in the assembler or

Using Assembly Language in C/C++

阅读更多关于 Using Assembly Language in C/C++

问题 I remember reading somewhere that to really optimize & speed up certain section of the code, programmers write that section in Assembly language. My questions are - Is this practice still done? and How does one do this? Isn't writing in Assembly Language a bit too cumbersome & archaic? When we compile C code (with or without -O3 flag), the compiler does some code optimization & links all libraries & converts the code to binary object file. So when we run the program it is already in its most

C++ : How can I know the size of Base class SubObject?

阅读更多关于 C++ : How can I know the size of Base class SubObject?

问题 . Here I was discussing Empty Base Optimization, and MSalters made this interesting comment: No class can ever have sizeof(Class)==0, empty or not. But we're talking specifically over the size of an empty base class subobject. It doesn't need its own vtable, nor a vtable pointer. Assume the common layout of a vtable pointer at offset 0; that would cause the zero-sized base class subobject to share its vtable pointer with the derived class. No problem: those should be identical anyway, that's

How to use if condition in intrinsics

阅读更多关于 How to use if condition in intrinsics

问题 I want to compare two floating point variables using intrinsics. If the comparison is true, do something else do something. I want to do this as a normal if..else condition. Is there any way using intrinsics? //normal code vector<float> v1, v2; for(int i = 0; i < v1.size(); ++i) if(v1[i]<v2[i]) { //do something } else { //do something ) How to do this using SSE2 or AVX? 回答1: SIMD conditional operations are done with branchless techniques. You use a packed-compare instruction to get a vector

Determine optimization level in preprocessor?

阅读更多关于 Determine optimization level in preprocessor?

问题 -Og is a relatively new optimization option that is intended to improve the debugging experience while apply optimizations. If a user selects -Og , then I'd like my source files to activate alternate code paths to enhance the debugging experience. GCC offers the __OPTIMIZE__ preprocessor macro, but its only set to 1 when optimizations are in effect. Is there a way to learn the optimization level, like -O1 , -O3 or -Og , for use with the preprocessor? 回答1: I believe this is not possible to