SIMD-8,SIMD-16 or SIMD-32 in opencl on gpgpu

I read couple of questions on SO for this topic(SIMD Mode), but still slight clarification/confirmation of how things work is required.

Why use SIMD if we have GPGPU?

SIMD intrinsics - are they usable on gpus?

CPU SIMD vs GPU SIMD?

Are following points correct,if I compile the code in SIMD-8 mode ? 1) it means 8 instructions of different work items are getting executing in parallel.

2) Does it mean All work items are executing the same instruction only?

3) if each wrok item code contains vload16 load then float16 operations and then vstore16 operations only. SIMD-8 mode will still work. I mean to say is it true GPU is till executing the same instruction (either vload16/ float16 / vstore16) for all 8 work items?

How should I understand this concept?

In the past many OpenCL vendors required to use vector types to be able to use SIMD. Nowadays OpenCL vendors are packing work items into SIMD so there is no need to use vector types. Whether is preffered to use vector types can be checked by querying for: CL_DEVICE_PREFERRED_VECTOR_WIDTH_<CHAR, SHORT, INT, LONG, FLOAT, DOUBLE>.

On Intel if vector type is used the vectorizer first scalarize them and then re-vectorize to make use of the wide instruction set. This is probably going to be similar on the other platforms.

来源：https://stackoverflow.com/questions/31753304/simd-8-simd-16-or-simd-32-in-opencl-on-gpgpu

标签

opencl

gpgpu

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!