OpenCL global worskize interpreted differently on Haswell & Kabylake iGPUs

家住魔仙堡 提交于 2019-12-13 20:46:25

问题


Our kernel is initialized with:

size_t localWorkSize[1] = {1};
size_t globalWorkSize[2] = {60, 80};

The kernel implements a typical convolution on an image file. It works fine on a machine with a Kabylake iGPU, but when executing it on Haswell or Bay Trail machines the global work size is interpreted as {60, 60} and therefore executes with a wrong NDRange.

On all systems our platform is OpenCL 1.2 beignet 1.3

Is this a known issue? Or is there a hardware-dependent limit to the global work size? There doesn't seem to be any info on that in the OpenCL Programming Guide.


回答1:


Local work size and global work size must have the same dimension. See the documentation to clEnqueueNDRangeKernel:

local_work_size  Points to an array of work_dim unsigned values
global_work_size  Points to an array of work_dim unsigned values

So your code

size_t localWorkSize[1] = {1};
size_t globalWorkSize[2] = {60, 80};

If you enqueue a kernel with those and with workdim == 2, the driver will read that as

size_t localWorkSize[2] = {1, something};
size_t globalWorkSize[2] = {60, 80};

where something is whatever is on stack above localWorkSize. You need to do

size_t localWorkSize[2] = {1, 1};


来源:https://stackoverflow.com/questions/54909805/opencl-global-worskize-interpreted-differently-on-haswell-kabylake-igpus

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!