Is there a rule of thumb for keeping the compiler happy when it looks at a kernel and assigns registers?
The compiler has a lot of flexibility, but I worry that it m
There is an option for NVIDIA platforms that even works programmatically, without the SDK. (Maybe there is something similar for AMD cards?)
You can specify the "-cl-nv-verbose" as the "build option" when calling clBuildProgram. This will generate some log information that can later be obtained via the build logs.
clBuildProgram(program, 0, NULL, "-cl-nv-verbose", NULL, NULL);
clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, ...);
(sorry, I'm not sure about the python syntax for this).
The result should be a string containing the desired information. For a simple vector addition, this shows
ptxas : info : 0 bytes gmem
ptxas : info : Compiling entry function 'sampleKernel' for 'sm_21'
ptxas : info : Function properties for sampleKernel
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas : info : Used 4 registers, 44 bytes cmem[0], 4 bytes cmem[16]
You can also use the "-cl-nv-maxrregcount=..." option to specify the maximum register count, but of course, all this is device- and platform specific, and thus should be used with care.