unroll loops in an AMD OpenCL kernel
问题 I'm trying to assess the performance differences between OpenCL for AMD .I have kernel for hough transfrom in the kernel i have two #pragma unroll statements when run the kernel not produce any speedup kernel void hough_circle(read_only image2d_t imageIn, global int* in,const int w_hough,__global int * circle) { sampler_t sampler=CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST; int gid0 = get_global_id(0); int gid1 = get_global_id(1); uint4 pixel; int x0=0,y0=0,r;