NDRange Number of work-items

你。 提交于 2019-12-13 06:27:10

问题


I'm trying to copy an image using OpenCL:

std::string kernelCode =
            "void kernel copy(global const int* image, global int* result)"
            "{"
                "result[get_global_id(0)] = image[get_global_id(0)];"
            "}";

The image contains 200 * 300 pixels.

The maximum number of work-items is 4100 according to CL_DEVICE_MAX_WORK_GROUP_SIZE

In the queue:

int size = _originalImage.width() * _originalImage.height();
//...
queue.enqueueNDRangeKernel(imgProcess, cl::NullRange, cl::NDRange(size), cl::NullRange);

Gives segfault.

queue.enqueueNDRangeKernel(imgProcess, cl::NullRange, cl::NDRange(10000), cl::NullRange);

Runs fine, but it gives back only part of the image.

What am I missing here?


回答1:


As you have already stated correctly, your CL_DEVICE_MAX_WORK_GROUP_SIZE is less than the number of threads you want to start. The segfault indicates an error in the runtime. You can get C++ errors from OpenCL if you add the following define at the beginning of your codefile (before you include any OpenCL headers)

#define __CL_ENABLE_EXCEPTIONS 

The second line of code clearly only copies the first 10000 pixels of your image instead of all 60000. If you want to use only 10000 threads, you need to do this call six times with an adjusted NDRange offset each time.

Generally I would advise to either use cl::copy to copy an image or modify your kernel to copy multiple pixels per thread.

Furthermore I'm quite unsure about the effect of setting the local workgroup size to NullRange. As the local workgroup size does not matter in your case, I think it is the best to just leave out this parameter and use the version of enqueueNDRangeKernel with only 3 arguments (omitting the last one).



来源:https://stackoverflow.com/questions/21612136/ndrange-number-of-work-items

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!