I read couple of questions on SO for this topic(SIMD Mode), but still slight clarification/confirmation of how things work is required.
Why use SIMD if we have GPGPU?
SIMD intrinsics - are they usable on gpus?
Are following points correct,if I compile the code in SIMD-8 mode ? 1) it means 8 instructions of different work items are getting executing in parallel.
2) Does it mean All work items are executing the same instruction only?
3) if each wrok item code contains vload16 load then float16 operations and then vstore16 operations only. SIMD-8 mode will still work. I mean to say is it true GPU is till executing the same instruction (either vload16/ float16 / vstore16) for all 8 work items?
How should I understand this concept?
In the past many OpenCL vendors required to use vector types to be able to use SIMD. Nowadays OpenCL vendors are packing work items into SIMD so there is no need to use vector types. Whether is preffered to use vector types can be checked by querying for: CL_DEVICE_PREFERRED_VECTOR_WIDTH_<CHAR, SHORT, INT, LONG, FLOAT, DOUBLE>
.
On Intel if vector type is used the vectorizer first scalarize them and then re-vectorize to make use of the wide instruction set. This is probably going to be similar on the other platforms.
来源:https://stackoverflow.com/questions/31753304/simd-8-simd-16-or-simd-32-in-opencl-on-gpgpu