CUDA: How to link a specific obj, ptx, cubin from a separate compilation?

不打扰是莪最后的温柔 提交于 2019-12-25 02:16:04

问题


I have a fairly large CUDA/C++ project that compiles to a static library. The toolchain is CUDA Toolkit 9.0/9.2 and VS 2017. I cannot change the company toolchain. Our most expensive kernel was hit by a nvcc compiler regression introduced in the 9.0 Toolkit. I have filed this with the Nvidia developer's website, and received confirmation of the regression. That was about a year ago, and the ticket is still open. Maybe the 10.0 Toolkit will fix it.

But I cannot wait. So my plan is to compile just this one specific kernel using the 8.0 nvcc compiler and v140 (VS 2015) compiler. It is a single .hpp file with __device__ decorator for the kernel declaration, and a .cu file with the definition. The kernel does not call other kernels; it is a rather simple kernel.

From the v140 Native Tools Command Prompt, I executed:

nvcc -x cu -arch=sm_61 -dc kernel.cu

And obtained a kernel.obj file. I have read the NVCC documentation on CUDA Compiler Driver NVCC. I confess to not entirely understanding. There are several compilation phases, and I do not see which is the correct course for my case.

My question is how to link this object file into my greater static library? If someone could point me to the correct series of commands, or better yet, how to include this into the VS Project, presumably with kernel.hpp and kernel.obj, I would be most grateful.


回答1:


Following Njuffa's comment above, the simplest solution is create a static library using the earlier, performant toolchain for that kernel (VS 2015 & CUDA 8.0 Tookit). Then link that library into the greater project with the later toolchain. I did so with success.

I created a CUDA 8.0 template project in VS 2015 with only the kernel source and header. The compilation target set to static library. This created a .lib file. The .lib file and header are then added to the C++ linker settings of the greater project, using VS 2017 and CUDA 9.0. All test executables using this static library pass. This is a much simpler solution than trying to recompile using an intermediate compilation format ( ptx, cubin, etc.)

Although ultimately, the real solution was to refactor the kernel to use shared memory more efficiently, negating the need for the older nvcc version.



来源:https://stackoverflow.com/questions/51770529/cuda-how-to-link-a-specific-obj-ptx-cubin-from-a-separate-compilation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!