OpenCL global worskize interpreted differently on Haswell & Kabylake iGPUs

问题

Our kernel is initialized with:

size_t localWorkSize[1] = {1};
size_t globalWorkSize[2] = {60, 80};

The kernel implements a typical convolution on an image file. It works fine on a machine with a Kabylake iGPU, but when executing it on Haswell or Bay Trail machines the global work size is interpreted as {60, 60} and therefore executes with a wrong NDRange.

On all systems our platform is OpenCL 1.2 beignet 1.3

Is this a known issue? Or is there a hardware-dependent limit to the global work size? There doesn't seem to be any info on that in the OpenCL Programming Guide.

回答1:

Local work size and global work size must have the same dimension. See the documentation to clEnqueueNDRangeKernel:

local_work_size  Points to an array of work_dim unsigned values
global_work_size  Points to an array of work_dim unsigned values

So your code

size_t localWorkSize[1] = {1};
size_t globalWorkSize[2] = {60, 80};

If you enqueue a kernel with those and with workdim == 2, the driver will read that as

size_t localWorkSize[2] = {1, something};
size_t globalWorkSize[2] = {60, 80};

where something is whatever is on stack above localWorkSize. You need to do

size_t localWorkSize[2] = {1, 1};

来源：https://stackoverflow.com/questions/54909805/opencl-global-worskize-interpreted-differently-on-haswell-kabylake-igpus

标签

opencl

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!