OpenCL - How to I query for a device's SIMD width?

In CUDA, there is a concept of a warp, which is defined as the maximum number of threads that can execute the same instruction simultaneously within a single processing element. For NVIDIA, this warp size is 32 for all of their cards currently on the market.

In ATI cards, there is a similar concept, but the terminology in this context is wavefront. After some hunting around, I found out that the ATI card I have has a wavefront size of 64.

My question is, what can I do to query for this SIMD width at runtime for OpenCL?

I found the answer I was looking for. It turns out that you don't query the device for this information, you query the kernel object (in OpenCL). My source is:

http://www.hpc.lsu.edu/training/tutorials/sc10/tutorials/SC10Tutorials/docs/M13/M13.pdf

(Page 108)

which says:

The most efficient work group sizes are likely to be multiples of the native hardware execution width

wavefront size in AMD speak/warp size in Nvidia speak

Query device for CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE

So, in short, the answer appears to be to call the clGetKernelWorkGroupInfo() method with a param name of CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE. See this link for more information on this method:

http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/clGetKernelWorkGroupInfo.html

On AMD, you can query CL_DEVICE_WAVEFRONT_WIDTH_AMD. That's different from CL_DEVICE_SIMD_WIDTH_AMD, which returns the number of threads it executes in each clock cycle. The latter may be smaller than the wavefront size, in which case it takes multiple clock cycles to execute one instruction for all the threads in a wavefront.

On NVIDIA, you can query the warp size width using clGetDeviceInfo with CL_DEVICE_WARP_SIZE_NV (although this is always 32 for current GPUs), however, this is an extension, as OpenCL defines nothing like warps or wavefronts. I don't know about any AMD extension that would allow to query for the wavefront size.

For AMD: clGetDeviceInfo(..., CL_DEVICE_WAVEFRONT_WIDTH_AMD, ...) (if cl_amd_device_attribute_query extension supported)

For Nvidia: clGetDeviceInfo(..., CL_DEVICE_WARP_SIZE_NV, ...) (if cl_nv_device_attribute_query extension supported)

But there is no uniform way. The way suggested by Jonathan DeCarlo doesn't work, I was using it for GPUs if these two extensions does not supported - for example Intel iGPU, but recently I faced wrong results on Intel HD 4600:

Intel HD 4600 says CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE=32 while in fact Intel GPUs seems to have wavefront equal to 16, so I faced incorrect results, everything works fine if barriers were used for wavefront=16.

P.S. I have not enough reputation to comment Jonathan DeCarlo answer about this, will be glad if somebody will add comment.

You can use the clGetDeviceInfo to get maximum number of workitems you can have in your local workset for each dimension. This is most likely multiple of your wavefront size.

See: http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/clGetDeviceInfo.html

For CUDA (using NVIDIA), please take a look at B.4.5 Cuda programming guide from NVIDIA. There is a variable for containing this information. You can query this variable at runtime. For AMD , I'm not sure if there is such a variable.

来源：https://stackoverflow.com/questions/7093488/opencl-how-to-i-query-for-a-devices-simd-width

标签

opencl

gpu

gpgpu