问题
Another question for me now. I've been trying to analyze the results of my kernel parallel to its execution while it's broken up to multiple calls. However, while clEnqueueReadBuffer has a boolean to determine whether it blocks or not, clEnqueueNDRangeKernel has none and I had assumed it was async always (It is being "enqueued" afterall which makes me assume that it would act like a task queue). However, when I run this block of code the outer code doesn't get executed until the kernel has been finished completely (I am not explicitly calling clFinish or anything like that would cause this behavior).
I'm running the kernel on an NVidia GPU. So why is this segment of code blocking and what could I do to remedy it within OpenCL? Otherwise, I'm considering running a separate thread solely to "enqueue" these kernel commands to the queue.
const size_t amountPerGo = multipleRoundUp(local_ws, (int)(50000));
//Finds the smallest multiple of local worksize that greater than the 50000 segment
std::cout << "Launch" << std::endl;
for( int j = 0; j < 10; j++ ) //Make the effects more extreme
{
for( size_t i = 0; i < dimensions.x*dimensions.y; i+= amountPerGo )
{
clSetKernelArg(rayKernel, 6, sizeof(int), &i);
std::cout << "sub" << std::endl;
error = clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &amountPerGo, &local_ws, 0, NULL, NULL);
// Reading back
clEnqueueReadBuffer(queue, outResult, CL_FALSE, sizeof(vec4)*i, sizeof(vec4)*(amountPerGo), resultSet+i, 0, NULL, NULL);
}
}
std::cout << "End launch Start" << std::endl;
回答1:
There is possiblity of simultaneous execution of OpenCL kernel & arguments setting. Try to use different kernel objects.
来源:https://stackoverflow.com/questions/25437554/clenqueuendrangekernel-blocks-execution