Reduce in Cuda for arbitrary number of elements

♀尐吖头ヾ 提交于 2019-12-01 12:14:55

问题


How can I implement version 7 of the code given in the following link: http://www.cuvilib.com/Reduction.pdf
for an input array whose size is an arbitrary number, in other words, not a power of 2?


回答1:


Version 7 already handles an arbitrary number of elements.

Perhaps instead of referring to the cuvilib link, you should look at the link to the relevant NVIDIA CUDA reduction sample. It includes essentially the pdf file you are using, but also sample codes that implement reductions 1 through 7 (labelled reduce0 through reduce6)

If you study the description of the reduction 7 in the document, you'll see that the initial reduction steps are handled via a while loop, that is causing the grid to loop through memory. As it loops through memory, each thread is accumulating multiple reduction elements.

This initial while loop is not limited to a particular size of problem (e.g. power of 2).

Due to the initial handling of the reduction via this while loop, later steps can be done as a super-efficient power of 2 at the threadblock level, as has been previously discussed in that document. But the initial input set size is not limited to a power of 2.

Please study the code given in the CUDA sample (reduce6).



来源:https://stackoverflow.com/questions/20048895/reduce-in-cuda-for-arbitrary-number-of-elements

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!