nvlink, relocatable device code and static device libraries

Deadly 提交于 2019-12-06 09:33:40

I suggest putting a complete simple example in the question, just as I have done below. External links to code are frowned on. When they go stale, the question becomes less valuable.

Yes, you have an error in generating libutil.a Creation of a static library with exposed device-linking is not the same as creation of a shared library with (by definition) no exposed device-linking. Notice my mention of "CUDA-free wrapper" in the previous question you linked. The example in this question has exposed device linking because my_square is in the library but is used by the code external to the library.

Review the nvcc relocatable device code compiling examples and you will find one that generates a device-linkable static library. There is no device-link step in the static library creation. The device-link step is done at the final executable creation (or in this case, at the creation of the so, i.e. the "CUDA boundary"). The "extra" device-link operation in static library creation is the proximal reason for the error you are observing.

Here's a fully worked example:

$ cat util.h

__device__ float my_square(float);

$ cat util.cu

__device__ float my_square(float val){ return val*val;}

$ cat test.h

float dbl_sq(float val);

$ cat test.cu
#include "util.h"

__global__ void my_dbl_sq(float *val){
  *val = 2*my_square(*val);
}

float dbl_sq(float val){
  float *d_val, h_val;
  cudaMalloc(&d_val, sizeof(float));
  h_val = val;
  cudaMemcpy(d_val, &h_val, sizeof(float), cudaMemcpyHostToDevice);
  my_dbl_sq<<<1,1>>>(d_val);
  cudaMemcpy(&h_val, d_val, sizeof(float), cudaMemcpyDeviceToHost);
  return h_val;
}
$ cat main.cpp
#include <stdio.h>
#include "test.h"

int main(){

  printf("%f\n", dbl_sq(2.0f));
  return 0;
}
$ nvcc -arch=sm_35 -Xcompiler -fPIC -dc util.cu
$ nvcc -arch=sm_35 -Xcompiler -fPIC -lib util.o -o libutil.a
$ nvcc -arch=sm_35 -Xcompiler -fPIC -dc test.cu
$ nvcc -arch=sm_35 -shared -Xcompiler -fPIC -L. -lutil test.o -o libtest.so
$ g++ -o main main.cpp libtest.so
$ cuda-memcheck ./main
========= CUDA-MEMCHECK
8.000000
========= ERROR SUMMARY: 0 errors
$

In this example, device-linking occurs automatically in the nvcc invocation that is used to create the .so library. In my example here, I have already set my LD_LIBRARY_PATH environment variable to include my working directory. Tested using CUDA 6.5 on CentOS 6.2 (Note that it is possible to perform multiple device-link operations during the creation of an executable, but these device-link operations must be within separate link domains, i.e. user-code or user-code entry points cannot be shared between the domains. That is not the case here.)

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!