loop-unrolling

Loop unrolling behaviour in GCC

痴心易碎 提交于 2019-11-30 03:49:55
问题 This question is in part a follow up question to GCC 5.1 Loop unrolling. According to the GCC documentation, and as stated in my answer to the above question, flags such as -funroll-loops turn on "complete loop peeling (i.e. complete removal of loops with a small constant number of iterations)" . Therefore, when such a flag is enabled, the compiler can choose to unroll a loop if it determines that this would optimise the execution of a given piece of code. Nevertheless, I noticed in one of my

How to vectorize my loop with g++?

痞子三分冷 提交于 2019-11-28 07:40:13
The introductory links I found while searching: 6.59.14 Loop-Specific Pragmas 2.100 Pragma Loop_Optimize How to give hint to gcc about loop count Tell gcc to specifically unroll a loop How to Force Vectorization in C++ As you can see most of them are for C, but I thought that they might work at C++ as well. Here is my code: template<typename T> //__attribute__((optimize("unroll-loops"))) //__attribute__ ((pure)) void foo(std::vector<T> &p1, size_t start, size_t end, const std::vector<T> &p2) { typename std::vector<T>::const_iterator it2 = p2.begin(); //#pragma simd //#pragma omp parallel for /

std::array with aggregate initialization on g++ generates huge code

不问归期 提交于 2019-11-28 07:23:48
On g++ 4.9.2 and 5.3.1, this code takes several seconds to compile and produces a 52,776 byte executable: #include <array> #include <iostream> int main() { constexpr std::size_t size = 4096; struct S { float f; S() : f(0.0f) {} }; std::array<S, size> a = {}; // <-- note aggregate initialization for (auto& e : a) std::cerr << e.f; return 0; } Increasing size seems to increase compilation time and executable size linearly. I cannot reproduce this behaviour with either clang 3.5 or Visual C++ 2015. Using -Os makes no difference. $ time g++ -O2 -std=c++11 test.cpp real 0m4.178s user 0m4.060s sys

Unroll loop and do independent sum with vectorization

守給你的承諾、 提交于 2019-11-28 01:08:22
For the following loop GCC will only vectorize the loop if I tell it to use associative math e.g. with -Ofast . float sumf(float *x) { x = (float*)__builtin_assume_aligned(x, 64); float sum = 0; for(int i=0; i<2048; i++) sum += x[i]; return sum; } Here is the assembly with -Ofast -mavx sumf(float*): vxorps %xmm0, %xmm0, %xmm0 leaq 8192(%rdi), %rax .L2: vaddps (%rdi), %ymm0, %ymm0 addq $32, %rdi cmpq %rdi, %rax jne .L2 vhaddps %ymm0, %ymm0, %ymm0 vhaddps %ymm0, %ymm0, %ymm1 vperm2f128 $1, %ymm1, %ymm1, %ymm0 vaddps %ymm1, %ymm0, %ymm0 vzeroupper ret This clearly shows the loop has been

Self-unrolling macro loop in C/C++

拈花ヽ惹草 提交于 2019-11-27 20:45:28
I am currently working on a project, where every cycle counts. While profiling my application I discovered that the overhead of some inner loop is quite high, because they consist of just a few machine instruction. Additionally the number of iterations in these loops is known at compile time. So I thought instead of manually unrolling the loop with copy & paste I could use macros to unroll the loop at compile time so that it can be easily modified later. What I image is something like this: #define LOOP_N_TIMES(N, CODE) <insert magic here> So that I can replace for (int i = 0; i < N, ++i) { do

How to vectorize my loop with g++?

落花浮王杯 提交于 2019-11-27 01:59:39
问题 The introductory links I found while searching: 6.59.14 Loop-Specific Pragmas 2.100 Pragma Loop_Optimize How to give hint to gcc about loop count Tell gcc to specifically unroll a loop How to Force Vectorization in C++ As you can see most of them are for C, but I thought that they might work at C++ as well. Here is my code: template<typename T> //__attribute__((optimize("unroll-loops"))) //__attribute__ ((pure)) void foo(std::vector<T> &p1, size_t start, size_t end, const std::vector<T> &p2)

std::array with aggregate initialization on g++ generates huge code

霸气de小男生 提交于 2019-11-27 01:48:30
问题 On g++ 4.9.2 and 5.3.1, this code takes several seconds to compile and produces a 52,776 byte executable: #include <array> #include <iostream> int main() { constexpr std::size_t size = 4096; struct S { float f; S() : f(0.0f) {} }; std::array<S, size> a = {}; // <-- note aggregate initialization for (auto& e : a) std::cerr << e.f; return 0; } Increasing size seems to increase compilation time and executable size linearly. I cannot reproduce this behaviour with either clang 3.5 or Visual C++

Unroll loop and do independent sum with vectorization

强颜欢笑 提交于 2019-11-26 21:50:48
问题 For the following loop GCC will only vectorize the loop if I tell it to use associative math e.g. with -Ofast . float sumf(float *x) { x = (float*)__builtin_assume_aligned(x, 64); float sum = 0; for(int i=0; i<2048; i++) sum += x[i]; return sum; } Here is the assembly with -Ofast -mavx sumf(float*): vxorps %xmm0, %xmm0, %xmm0 leaq 8192(%rdi), %rax .L2: vaddps (%rdi), %ymm0, %ymm0 addq $32, %rdi cmpq %rdi, %rax jne .L2 vhaddps %ymm0, %ymm0, %ymm0 vhaddps %ymm0, %ymm0, %ymm1 vperm2f128 $1,