loop-unrolling

In what types of loops is it best to use the #pragma unroll directive in CUDA?

╄→гoц情女王★ 提交于 2019-12-04 10:51:14
问题 In CUDA it is possible to unroll loops using the #pragma unroll directive to improve performance by increasing instruction level parallelism. The #pragma can optionally be followed by a number that specifies how many times the loop must be unrolled. Unfortunately the docs do not give specific directions on when this directive should be used. Since small loops with a known trip count are already unrolled by the compiler, should #pragma unroll be used on larger loops? On small loops with a

GCC 5.1 Loop unrolling

时光总嘲笑我的痴心妄想 提交于 2019-12-04 07:17:42
Given the following code #include <stdio.h> int main(int argc, char **argv) { int k = 0; for( k = 0; k < 20; ++k ) { printf( "%d\n", k ) ; } } Using GCC 5.1 or later with -x c -std=c99 -O3 -funroll-all-loops --param max-completely-peeled-insns=1000 --param max-completely-peel-times=10000 does partially loop unrolling, it unrolls the loop ten times and then does a conditional jump. .LC0: .string "%d\n" main: pushq %rbx xorl %ebx, %ebx .L2: movl %ebx, %esi movl $.LC0, %edi xorl %eax, %eax call printf leal 1(%rbx), %esi movl $.LC0, %edi xorl %eax, %eax call printf leal 2(%rbx), %esi movl $.LC0,

How to ask GCC to completely unroll this loop (i.e., peel this loop)?

十年热恋 提交于 2019-12-04 01:00:02
问题 Is there a way to instruct GCC (I'm using 4.8.4) to unroll the while loop in the bottom function completely , i.e., peel this loop? The number of iterations of the loop is known at compilation time: 58. Let me first explain what I have tried. By checking GAS ouput: gcc -fpic -O2 -S GEPDOT.c 12 registers XMM0 - XMM11 are used. If I pass the flag -funroll-loops to gcc: gcc -fpic -O2 -funroll-loops -S GEPDOT.c the loop is only unrolled two times. I checked the GCC optimization options. GCC says

Alternative to if, else if

柔情痞子 提交于 2019-12-03 15:59:33
问题 I have a lot of if, else if statements and I know there has to be a better way to do this but even after searching stackoverflow I'm unsure of how to do so in my particular case. I am parsing text files (bills) and assigning the name of the service provider to a variable (txtvar.Provider) based on if certain strings appear on the bill. This is a small sample of what I'm doing (don't laugh, I know it's messy). All in all, There are approximately 300 if, else if's. if (txtvar.BillText.IndexOf(

template arguments inside a compile time unrolled for loop?

一笑奈何 提交于 2019-12-03 14:48:28
wikipedia ( here ) gives a compile time unrolling of for loop....... i was wondering can we use a similar for loop with template statements inside... for example... is the following loop valid template<int max_subdomain> void Device<max_sudomain>::createSubDomains() { for(int i=0; i< max_subdomain; ++i) { SubDomain<i> tmp(member); ... // some operations on tmp ... } } SubDomain is a class which takes in the a template parameter int and here has been constructed with an argument that is a member of the Device class. Thanks for the answer guys... now that you know what i want... is there anyway

In what types of loops is it best to use the #pragma unroll directive in CUDA?

落花浮王杯 提交于 2019-12-03 06:57:33
In CUDA it is possible to unroll loops using the #pragma unroll directive to improve performance by increasing instruction level parallelism. The #pragma can optionally be followed by a number that specifies how many times the loop must be unrolled. Unfortunately the docs do not give specific directions on when this directive should be used. Since small loops with a known trip count are already unrolled by the compiler, should #pragma unroll be used on larger loops? On small loops with a variable counter? And what about the optional number of unrolls? Also is there recommended documentation

Alternative to if, else if

别等时光非礼了梦想. 提交于 2019-12-03 05:18:08
I have a lot of if, else if statements and I know there has to be a better way to do this but even after searching stackoverflow I'm unsure of how to do so in my particular case. I am parsing text files (bills) and assigning the name of the service provider to a variable (txtvar.Provider) based on if certain strings appear on the bill. This is a small sample of what I'm doing (don't laugh, I know it's messy). All in all, There are approximately 300 if, else if's. if (txtvar.BillText.IndexOf("SWGAS.COM") > -1) { txtvar.Provider = "Southwest Gas"; } else if (txtvar.BillText.IndexOf("georgiapower

Java JIT loop unrolling policy?

坚强是说给别人听的谎言 提交于 2019-12-01 16:51:47
What is the loop unrolling policy for JIT? Or if there is no simple answer to that, then is there some way i can check where/when loop unrolling is being performed in a loop? GNode child = null; for(int i=0;i<8;i++){ child = octree.getNeighbor(nn, i, MethodFlag.NONE); if(child==null) break; RecurseForce(leaf, child, dsq, epssq); } Basically, i have a piece of code above that has a static number of iterations (eight), and it does bad when i leave the for loop as it is. But when i manually unroll the loop, it does significantly better. I am interested in finding out if the JIT actually does

How do optimizing compilers decide when and how much to unroll a loop?

∥☆過路亽.° 提交于 2019-12-01 03:14:55
When a compiler performs a loop-unroll optimization, how does it determined by which factor to unroll the loop or whether to unroll the whole loop? Since this is a space-performance trade-off, on average how effictive is this optimization technique in making the program perform better? Also, under what conditions is it recommended to use this technique (i.e certain operations or calculations)? This doesn't have to be specific to a certain compiler. It can be any explanation outlining the idea behind this technique and what has been observed in practice. When a compiler performs a loop unroll

Loop unrolling behaviour in GCC

若如初见. 提交于 2019-11-30 19:26:18
This question is in part a follow up question to GCC 5.1 Loop unrolling . According to the GCC documentation , and as stated in my answer to the above question, flags such as -funroll-loops turn on "complete loop peeling (i.e. complete removal of loops with a small constant number of iterations)" . Therefore, when such a flag is enabled, the compiler can choose to unroll a loop if it determines that this would optimise the execution of a given piece of code. Nevertheless, I noticed in one of my projects that GCC would sometimes unroll loops even though the relevant flags were not enabled . For