How to limit the number of registers used by each thread in Numba (CUDA)

拈花ヽ惹草 提交于 2019-12-25 09:56:15

问题


as the title says I would like to know if there is a way to limit the number of registers used by each thread when I launch a kernel. I'm performing a lot of computation on each thread and so the number of registers used is too high and then the occupancy is low. I would like to try to reduce the number of registers used in order to try to improve parallel thread execution, maybe at the cost of more memory accesses.

I searched for the answer but I didn't find a solution. I think that is possible to set a maximum number of registers used by thread with the CUDA toolchain, but is it also possible when using Numba?

EDIT: Maybe also forcing a minimum numbers of blocks to be executed in a multi processor in order to force the compiler to reduce the number of used registers.


回答1:


To the best of my knowledge, the cuda.jit facility offered by numba does not allow passing of arguments to the CUDA assembler which would allow control of register allocation, as is possible with the native CUDA toolchain.

So I don't think there is a way to do what you have asked about.



来源:https://stackoverflow.com/questions/46501369/how-to-limit-the-number-of-registers-used-by-each-thread-in-numba-cuda

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!