Opencl: Determine the best local_item_size

你离开我真会死。 提交于 2019-12-13 04:47:32

问题


My code acts like 2d matrix muliplication ( http://gpgpu-computing4.blogspot.de/2009/09/matrix-multiplication-2-opencl.html). The dimenstions of the matrixes are (1000*1000 and 10000*10000 and 100000*100000).

My Hardware is: NVIDIA Corporation GM204 [GeForce GTX 980] (MAX_WORK_GROUP_SIZES: 1024 1024 64).

The question is:

What is the best local_item_size can I use?

size_t local_item_size[2], global_item_size[2];
global_item_size[0] = number_of_points; 
global_item_size[1] = number_of_points; 
local_item_size[0] = 10; 
local_item_size[1] = 10;

Thanks in advance,


回答1:


on nvidia cards you should use multiplies of 32 as total threads in a workgroup (so 8*8 should be ok). Global work size must be a multiple of the local work size in each dimension, so it must be modified as well.

This may need some modification in the kernel code too, to handle out-of-range values (there may be more work items, than data).

Note that if you don't specify the local workgroup size (e.g. pass null into it), the driver will choose it automatically. It's not guaranteed that it picks the best size, but it's worth trying.



来源:https://stackoverflow.com/questions/30593848/opencl-determine-the-best-local-item-size

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!