问题
I have recently started trying to study OpenCl and am trying to convert the following code into an efficient OpenCl kernel:
for(int i = 0; i < VECTOR_SIZE; i++)
{
for(int j = 0; j < 100; j++)
{
C[i] = sqrt(A[i] + sqrt(A[i] * B[i])) * sqrt(A[i] + sqrt(A[i] * B[i]));
}
}
This is what I have come up with so far using different tutorials. My question is, can I somehow get rid of the outer loop in my kernel. Would you say that this is an okey implementation of the above C++ code and no further thing can be done to make it more efficient or close to how an openCL program is supposed to be like.
Also, all the tutorials that I have read so far have the kernels written in a const char *. What is reason behind this and is this the only way OPenCL kernels are written or usually we code them in some other file and then include it in our regular code or something.
Thanks
const char *RandomComputation =
"__kernel \n"
"void RandomComputation( "
" __global float *A, \n"
" __global float *B, \n"
" __global float *C) \n"
"{ \n"
" //Get the index of the work-item \n"
" int index = get_global_id(0); \n"
" for (int j = 0; j < 100 ; j++) \n"
" { \n"
" C[index] = sqrt(A[index] + sqrt(A[index] * B[index])) * sqrt(A[index] + sqrt(A[index] * B[index])); \n"
"} \n"
"} \n";
回答1:
When you want to use nested loop in OpenCL kernel , use the two dimension like this example as matrix multiplication .
__kernel void matrixMul(__global float* C,
__global float* A,
__global float* B,
int wA, int wB)
{
int tx = get_global_id(0);
int ty = get_global_id(1);
float value = 0;
for (int k = 0; k < wA; ++k)
{
float elementA = A[ty * wA + k];
float elementB = B[k * wB + tx];
value += elementA * elementB;
}
C[ty * wA + tx] = value;
}
Did you need full explanation on here
来源:https://stackoverflow.com/questions/24157242/nested-loops-in-opencl-kernel