compiler-optimization

Why gcc autovectorization does not work on convolution matrix biger than 3x3?

↘锁芯ラ 提交于 2019-12-18 21:04:27
问题 I've implemented the following program for convolution matrix #include <stdio.h> #include <time.h> #define NUM_LOOP 1000 #define N 128 //input or output dimention 1 #define M N //input or output dimention 2 #define P 5 //convolution matrix dimention 1 if you want a 3x3 convolution matrix it must be 3 #define Q P //convolution matrix dimention 2 #define Csize P*Q #define Cdiv 1 //div for filter #define Coffset 0 //offset //functions void unusual(); //unusual implementation of convolution void

Why can't gcc devirtualize this function call?

心已入冬 提交于 2019-12-18 18:36:23
问题 #include <cstdio> #include <cstdlib> struct Interface { virtual void f() = 0; }; struct Impl1: Interface { void f() override { std::puts("foo"); } }; // or __attribute__ ((visibility ("hidden")))/anonymous namespace static Interface* const ptr = new Impl1 ; int main() { ptr->f(); } When compiled with g++-7 -O3 -flto -fdevirtualize-at-ltrans -fipa-pta -fuse-linker-plugin , the above ptr->f() call cannot be devirtualized. It seems that no external library can modify ptr . Is this a deficiency

Is GCC's option -O2 breaking this small program or do I have undefined behavior [duplicate]

懵懂的女人 提交于 2019-12-18 12:56:20
问题 This question already has answers here : Decrementing a pointer out of bounds; incrementing it into bounds [duplicate] (3 answers) Why is out-of-bounds pointer arithmetic undefined behaviour? (7 answers) Closed 5 years ago . I found this problem in a very large application, have made an SSCCE from it. I don't know whether the code has undefined behavior or -O2 breaks it. When compiling it with gcc a.c -o a.exe -O2 -Wall -Wextra -Werror it prints 5 . But it prints 25 when compiling without -O2

g++ compiler flag to minimize binary size

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-18 10:56:10
问题 I'm have an Arduino Uno R3. I'm making logical objects for each of my sensors using C++. The Arduino has very limited on-board memory 32KB*, and, on average, my compiled objects are coming out around 6KB*. I am already using the smallest possible data types required, in an attempt to minimize my memory footprint. Is there a compiler flag to minimize the size of the binary, or do I need to use shorter variable and function names, less functions, etc. to minimize my code base? Also, any other

Return value optimizations and side-effects

落爺英雄遲暮 提交于 2019-12-18 10:34:01
问题 Return value optimization (RVO) is an optimization technique involving copy elision, which eliminates the temporary object created to hold a function's return value in certain situations. I understand the benefit of RVO in general, but I have a couple of questions. The standard says the following about it in §12.8, paragraph 32 of this working draft (emphasis mine). When certain criteria are met, an implementation is allowed to omit the copy/move construction of a class object, even if the

Bug only occurring when compile optimization enabled

牧云@^-^@ 提交于 2019-12-18 10:17:03
问题 I came across a bug in code that is only reproduced when the code is built with optimizations enabled. I've made a console app that replicates the logic for testing (code below). You'll see that when optimization is enabled 'value' becomes null after execution of this invalid logic: if ((value == null || value == new string[0]) == false) The fix is straight forward and is commented out below the offending code. But... I'm more concerned that I may have come across a bug in the assembler or

Using Assembly Language in C/C++

吃可爱长大的小学妹 提交于 2019-12-18 10:05:27
问题 I remember reading somewhere that to really optimize & speed up certain section of the code, programmers write that section in Assembly language. My questions are - Is this practice still done? and How does one do this? Isn't writing in Assembly Language a bit too cumbersome & archaic? When we compile C code (with or without -O3 flag), the compiler does some code optimization & links all libraries & converts the code to binary object file. So when we run the program it is already in its most

C++ : How can I know the size of Base class SubObject?

浪子不回头ぞ 提交于 2019-12-18 07:06:49
问题 . Here I was discussing Empty Base Optimization, and MSalters made this interesting comment: No class can ever have sizeof(Class)==0, empty or not. But we're talking specifically over the size of an empty base class subobject. It doesn't need its own vtable, nor a vtable pointer. Assume the common layout of a vtable pointer at offset 0; that would cause the zero-sized base class subobject to share its vtable pointer with the derived class. No problem: those should be identical anyway, that's

How to use if condition in intrinsics

╄→гoц情女王★ 提交于 2019-12-18 05:23:15
问题 I want to compare two floating point variables using intrinsics. If the comparison is true, do something else do something. I want to do this as a normal if..else condition. Is there any way using intrinsics? //normal code vector<float> v1, v2; for(int i = 0; i < v1.size(); ++i) if(v1[i]<v2[i]) { //do something } else { //do something ) How to do this using SSE2 or AVX? 回答1: SIMD conditional operations are done with branchless techniques. You use a packed-compare instruction to get a vector

Determine optimization level in preprocessor?

十年热恋 提交于 2019-12-18 04:31:46
问题 -Og is a relatively new optimization option that is intended to improve the debugging experience while apply optimizations. If a user selects -Og , then I'd like my source files to activate alternate code paths to enhance the debugging experience. GCC offers the __OPTIMIZE__ preprocessor macro, but its only set to 1 when optimizations are in effect. Is there a way to learn the optimization level, like -O1 , -O3 or -Og , for use with the preprocessor? 回答1: I believe this is not possible to