Do the threads in a CUDA warp execute in parallel on a multiprocessor?

邮差的信 提交于 2019-12-11 16:15:43

问题


A warp is 32 threads. Does the 32 threads execute in parallel in a Multiprocessor? If 32 threads are not executing in parallel then there is no race condition in the warp. I got this doubt after going through the some examples.


回答1:


In the CUDA programming model, all the threads within a warp run in parallel. But the actual execution in hardware may not be parallel because the number of cores within a SM (Stream Multiprocessor) can be less than 32. For example, GT200 architecture have 8 cores per SM, and the threads within a warp would need 4 clock cycles to finish the execution.

If multiple threads write to the same location (either shared memory or global memory), and if you don't want race, then you have to use atomic operations or locks, because CUDA programming model does not guarantee which thread is going to write.




回答2:


Yes. The 32 threads in a WARP will execute in parallel. The GPU is a SIMT (single instruction multiple thread) machine, single instruction which is executed by multiple threads in parallel.

Btw, SIMT is somewhat of a marketing term, it is basically the same as SIMD.



来源:https://stackoverflow.com/questions/5268103/cuda-threads-in-a-wrap

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!