Nested loops in OpenCl Kernel

牧云@^-^@ 提交于 2019-12-13 03:07:14

问题


I have recently started trying to study OpenCl and am trying to convert the following code into an efficient OpenCl kernel:

for(int i = 0; i < VECTOR_SIZE; i++)
{
    for(int j = 0; j < 100; j++)
    {
        C[i] = sqrt(A[i] + sqrt(A[i] * B[i])) * sqrt(A[i] + sqrt(A[i] * B[i]));
    }
}

This is what I have come up with so far using different tutorials. My question is, can I somehow get rid of the outer loop in my kernel. Would you say that this is an okey implementation of the above C++ code and no further thing can be done to make it more efficient or close to how an openCL program is supposed to be like.

Also, all the tutorials that I have read so far have the kernels written in a const char *. What is reason behind this and is this the only way OPenCL kernels are written or usually we code them in some other file and then include it in our regular code or something.

Thanks

     const char *RandomComputation =
"__kernel                                   \n"
"void RandomComputation(                              "
"                  __global float *A,       \n"
"                  __global float *B,       \n"
"                  __global float *C)       \n"
"{                                          \n"
"    //Get the index of the work-item       \n"
"    int index = get_global_id(0);          \n"
"   for (int j = 0; j < 100 ; j++)          \n"
"   {                                       \n"
"    C[index] = sqrt(A[index] + sqrt(A[index] * B[index])) * sqrt(A[index] + sqrt(A[index] * B[index])); \n"
"}                                          \n"
"}                                          \n";

回答1:


When you want to use nested loop in OpenCL kernel , use the two dimension like this example as matrix multiplication .

__kernel void matrixMul(__global float* C, 
      __global float* A, 
      __global float* B, 
      int wA, int wB)
{
   int tx = get_global_id(0); 
   int ty = get_global_id(1);
   float value = 0;
   for (int k = 0; k < wA; ++k)
   {
     float elementA = A[ty * wA + k];
     float elementB = B[k * wB + tx];
     value += elementA * elementB;
   }
   C[ty * wA + tx] = value;
}

Did you need full explanation on here



来源:https://stackoverflow.com/questions/24157242/nested-loops-in-opencl-kernel

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!