问题
as the title says I would like to know if there is a way to limit the number of registers used by each thread when I launch a kernel. I'm performing a lot of computation on each thread and so the number of registers used is too high and then the occupancy is low. I would like to try to reduce the number of registers used in order to try to improve parallel thread execution, maybe at the cost of more memory accesses.
I searched for the answer but I didn't find a solution. I think that is possible to set a maximum number of registers used by thread with the CUDA toolchain, but is it also possible when using Numba?
EDIT: Maybe also forcing a minimum numbers of blocks to be executed in a multi processor in order to force the compiler to reduce the number of used registers.
回答1:
To the best of my knowledge, the cuda.jit facility offered by numba does not allow passing of arguments to the CUDA assembler which would allow control of register allocation, as is possible with the native CUDA toolchain.
So I don't think there is a way to do what you have asked about.
来源:https://stackoverflow.com/questions/46501369/how-to-limit-the-number-of-registers-used-by-each-thread-in-numba-cuda