问题
Our kernel is initialized with:
size_t localWorkSize[1] = {1};
size_t globalWorkSize[2] = {60, 80};
The kernel implements a typical convolution on an image file. It works fine on a machine with a Kabylake iGPU, but when executing it on Haswell or Bay Trail machines the global work size is interpreted as {60, 60} and therefore executes with a wrong NDRange.
On all systems our platform is OpenCL 1.2 beignet 1.3
Is this a known issue? Or is there a hardware-dependent limit to the global work size? There doesn't seem to be any info on that in the OpenCL Programming Guide.
回答1:
Local work size and global work size must have the same dimension. See the documentation to clEnqueueNDRangeKernel:
local_work_size Points to an array of work_dim unsigned values
global_work_size Points to an array of work_dim unsigned values
So your code
size_t localWorkSize[1] = {1};
size_t globalWorkSize[2] = {60, 80};
If you enqueue a kernel with those and with workdim == 2, the driver will read that as
size_t localWorkSize[2] = {1, something};
size_t globalWorkSize[2] = {60, 80};
where something is whatever is on stack above localWorkSize. You need to do
size_t localWorkSize[2] = {1, 1};
来源:https://stackoverflow.com/questions/54909805/opencl-global-worskize-interpreted-differently-on-haswell-kabylake-igpus