nvrtc

Why integer division and modulo isn't optimized out in NVRTC

时光怂恿深爱的人放手 提交于 2019-12-12 14:34:16
问题 I compiled a kernel in NVRTC: __global__ void kernel_A(/* args */) { unsigned short idx = threadIdx.x; unsigned char warp_id = idx / 32; unsigned char lane_id = idx % 32; /* ... */ } I know integer division and modulo are very costly on CUDA GPUs. However I thought this kind of division-by-power-of-2 should be optimized into bit operations, until I found it isn't: __global__ void kernel_B(/* args */) { unsigned short idx = threadIdx.x; unsigned char warp_id = idx >> 5; unsigned char lane_id =

How do you include standard CUDA libraries to link with NVRTC code?

▼魔方 西西 提交于 2019-11-28 13:46:44
Specifically, my issue is that I have CUDA code that needs <curand_kernel.h> to run. This isn't included by default in NVRTC. Presumably then when creating the program context (i.e. the call to nvrtcCreateProgram ), I have to send in the name of the file ( curand_kernel.h ) and also the source code of curand_kernel.h ? I feel like I shouldn't have to do that. It's hard to tell; I haven't managed to find an example from NVIDIA of someone needing standard CUDA files like this as a source, so I really don't understand what the syntax is. Some issues: curand_kernel.h also has includes... Do I have

How do you include standard CUDA libraries to link with NVRTC code?

别说谁变了你拦得住时间么 提交于 2019-11-27 07:52:50
问题 Specifically, my issue is that I have CUDA code that needs <curand_kernel.h> to run. This isn't included by default in NVRTC. Presumably then when creating the program context (i.e. the call to nvrtcCreateProgram ), I have to send in the name of the file ( curand_kernel.h ) and also the source code of curand_kernel.h ? I feel like I shouldn't have to do that. It's hard to tell; I haven't managed to find an example from NVIDIA of someone needing standard CUDA files like this as a source, so I