发表新帖

发表新帖

Estimate OpenCL Register Use

前端未结

关注

 2  852

不要未来只要你来 2020-12-12 03:02

Is there a rule of thumb for keeping the compiler happy when it looks at a kernel and assigns registers?

The compiler has a lot of flexibility, but I worry that it m

2条回答

情话喂你 (楼主)

2020-12-12 03:54
There is an option for NVIDIA platforms that even works programmatically, without the SDK. (Maybe there is something similar for AMD cards?)

You can specify the "-cl-nv-verbose" as the "build option" when calling clBuildProgram. This will generate some log information that can later be obtained via the build logs.
```
clBuildProgram(program, 0, NULL, "-cl-nv-verbose", NULL, NULL);
clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, ...);
```
(sorry, I'm not sure about the python syntax for this).

The result should be a string containing the desired information. For a simple vector addition, this shows
```
ptxas : info : 0 bytes gmem
ptxas : info : Compiling entry function 'sampleKernel' for 'sm_21'
ptxas : info : Function properties for sampleKernel
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas : info : Used 4 registers, 44 bytes cmem[0], 4 bytes cmem[16]
```
You can also use the "-cl-nv-maxrregcount=..." option to specify the maximum register count, but of course, all this is device- and platform specific, and thus should be used with care.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题