问题
My code acts like 2d matrix muliplication ( http://gpgpu-computing4.blogspot.de/2009/09/matrix-multiplication-2-opencl.html). The dimenstions of the matrixes are (1000*1000 and 10000*10000 and 100000*100000).
My Hardware is: NVIDIA Corporation GM204 [GeForce GTX 980] (MAX_WORK_GROUP_SIZES: 1024 1024 64).
The question is:
What is the best local_item_size can I use?
size_t local_item_size[2], global_item_size[2]; global_item_size[0] = number_of_points; global_item_size[1] = number_of_points; local_item_size[0] = 10; local_item_size[1] = 10;
Thanks in advance,
回答1:
on nvidia cards you should use multiplies of 32 as total threads in a workgroup (so 8*8 should be ok). Global work size must be a multiple of the local work size in each dimension, so it must be modified as well.
This may need some modification in the kernel code too, to handle out-of-range values (there may be more work items, than data).
Note that if you don't specify the local workgroup size (e.g. pass null into it), the driver will choose it automatically. It's not guaranteed that it picks the best size, but it's worth trying.
来源:https://stackoverflow.com/questions/30593848/opencl-determine-the-best-local-item-size