Cuda program not working for more than 1024 threads

无人久伴 提交于 2019-12-02 12:35:01
Robert Crovella

CUDA threadblocks are limited to 1024 threads (or 512 threads, for cc 1.x gpus). The size of the threadblock is indicated in the second kernel configuration parameter in the kernel launch:

    even<<<3,SIZE>>>(d_A,SIZE);
             ^^^^

So when you enter a SIZE value greater than 1024, this kernel will not launch.

You're getting no indication of this because you're not doing proper cuda error checking which is always a good idea any time you're having trouble with a CUDA code. You can also, as a quick test, run your code with cuda-memcheck to look for API errors.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!