Estimate OpenCL Register Use

前端 未结 2 852

Is there a rule of thumb for keeping the compiler happy when it looks at a kernel and assigns registers?

The compiler has a lot of flexibility, but I worry that it m

2条回答
  •  情话喂你
    2020-12-12 03:54

    There is an option for NVIDIA platforms that even works programmatically, without the SDK. (Maybe there is something similar for AMD cards?)

    You can specify the "-cl-nv-verbose" as the "build option" when calling clBuildProgram. This will generate some log information that can later be obtained via the build logs.

    clBuildProgram(program, 0, NULL, "-cl-nv-verbose", NULL, NULL);
    clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, ...);
    

    (sorry, I'm not sure about the python syntax for this).

    The result should be a string containing the desired information. For a simple vector addition, this shows

    ptxas : info : 0 bytes gmem
    ptxas : info : Compiling entry function 'sampleKernel' for 'sm_21'
    ptxas : info : Function properties for sampleKernel
        0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
    ptxas : info : Used 4 registers, 44 bytes cmem[0], 4 bytes cmem[16]
    

    You can also use the "-cl-nv-maxrregcount=..." option to specify the maximum register count, but of course, all this is device- and platform specific, and thus should be used with care.

提交回复
热议问题