Branch predication on GPU

痞子三分冷 提交于 2019-11-27 21:31:43

All of the CUDA capable architectures released so far operate like an SIMD machine. When there is branch divergence within a warp, both code paths are executed by all the threads in the warp, with the threads which are not following the active path executing the functional equivalent of a NOP (I think I recall that there is a conditional execution flag attached to each thread in a warp which allows non executing threads to be masked off).

So in your example, the 90 cycles answer is probably a better approximation of what really happens than the alternative.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!