Using a barrier causes a CL_INVALID_WORK_GROUP_SIZE error

痴心易碎 提交于 2020-01-06 19:50:13

问题


If I use a barrier (no matter if CLK_LOCAL_MEM_FENCE or CLK_GLOBAL_MEM_FENCE) in my kernel, it causes a CL_INVALID_WORK_GROUP_SIZE error. The global work size is 512, the local work size is 128, 65536 items have to be computed, the max work group size of my device is 1024, I am using only one dimension. For Java bindings I use JOCL. The kernel is very simple:

kernel void sum(global float *input, global float *output, const int numElements, local float *localCopy
{
    localCopy[get_local_id(0)] = grid[get_global_id(0)];
    barrier(CLK_LOCAL_MEM_FENCE); // or barrier(CLK_GLOBAL_MEM_FENCE)
}

I run the kernel on the Intel(R) Xeon(R) CPU X5570 @ 2.93GHz and can use OpenCL 1.2. The calling method looks like

kernel.putArg(aCLBuffer).putArg(bCLBuffer).putArg(elementCount).putNullArg(localWorkSize);
queue.put1DRangeKernel(kernel, 0, globalWorkSize, localWorkSize);

But the error is always the same:

[...]can not enqueue 1DRange CLKernel [...] with gwo: null gws: {512} lws: {128} 
cond.: null events: null [error: CL_INVALID_WORK_GROUP_SIZE]

What I am doing wrong?


回答1:


This is expected behaviour on some OpenCL platforms. For example, on my Apple system, the CPU device has a maximum work-group size of 1024. However, if a kernel has a barrier inside, then the maximum work-group size for that specific kernel is reduced to 1.

You can query the maximum work-group size for a specific kernel by using the clGetKernelWorkGroupInfo function with the CL_KERNEL_WORK_GROUP_SIZE parameter. The value returned will be no more than the value returned by clGetDeviceInfo and CL_DEVICE_MAX_WORK_GROUP_SIZE, but is allowed to be less (as it is in this case).



来源:https://stackoverflow.com/questions/26278448/using-a-barrier-causes-a-cl-invalid-work-group-size-error

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!