CUDA and nvcc: using the preprocessor to choose between float or double

后端 未结 2 1282
别跟我提以往
别跟我提以往 2020-11-30 06:16

The problem:

Having a .h, I want to define real to be double if compiling for c/c++ or for cuda with computing capability >= 1.3. If compiling for

2条回答
  •  执笔经年
    2020-11-30 07:01

    It seems you might be conflating two things - how to differentiate between the host and device compilation trajectories when nvcc is processing CUDA code, and how to differentiate between CUDA and non-CUDA code. There is a subtle difference between the two. __CUDA_ARCH__ answers the first question, and __CUDACC__ answers the second.

    Consider the following code snippet:

    #ifdef __CUDACC__
    #warning using nvcc
    
    template 
    __global__ void add(T *x, T *y, T *z)
    {
        int idx = threadIdx.x + blockDim.x * blockIdx.x;
    
        z[idx] = x[idx] + y[idx];
    }
    
    #ifdef __CUDA_ARCH__
    #warning device code trajectory
    #if __CUDA_ARCH__ > 120
    #warning compiling with double precision
    template void add(double *, double *, double *);
    #else
    #warning compiling with single precision
    template void add(float *, float *, float *);
    #else
    #warning nvcc host code trajectory
    #endif
    #else
    #warning non-nvcc code trajectory
    #endif
    

    Here we have a templated CUDA kernel with CUDA architecture dependent instantiation, a separate stanza for host code steeered by nvcc, and a stanza for compilation of host code not steered by nvcc. This behaves as follows:

    $ ln -s cudaarch.cu cudaarch.cc
    $ gcc -c cudaarch.cc -o cudaarch.o
    cudaarch.cc:26:2: warning: #warning non-nvcc code trajectory
    
    $ nvcc -arch=sm_11 -Xptxas="-v" -c cudaarch.cu -o cudaarch.cu.o
    cudaarch.cu:3:2: warning: #warning using nvcc
    cudaarch.cu:14:2: warning: #warning device code trajectory
    cudaarch.cu:19:2: warning: #warning compiling with single precision
    cudaarch.cu:3:2: warning: #warning using nvcc
    cudaarch.cu:23:2: warning: #warning nvcc host code trajectory
    ptxas info    : Compiling entry function '_Z3addIfEvPT_S1_S1_' for 'sm_11'
    ptxas info    : Used 4 registers, 12+16 bytes smem
    
    $ nvcc -arch=sm_20 -Xptxas="-v" -c cudaarch.cu -o cudaarch.cu.o
    cudaarch.cu:3:2: warning: #warning using nvcc
    cudaarch.cu:14:2: warning: #warning device code trajectory
    cudaarch.cu:16:2: warning: #warning compiling with double precision
    cudaarch.cu:3:2: warning: #warning using nvcc
    cudaarch.cu:23:2: warning: #warning nvcc host code trajectory
    ptxas info    : Compiling entry function '_Z3addIdEvPT_S1_S1_' for 'sm_20'
    ptxas info    : Used 8 registers, 44 bytes cmem[0]
    

    The take away points from this are:

    • __CUDACC__ defines whether nvcc is steering compilation or not
    • __CUDA_ARCH__is always undefined when compiling host code, steered by nvcc or not
    • __CUDA_ARCH__is only defined for the device code trajectory of compilation steered by nvcc

    Those three pieces of information are always enough to have conditional compilation for device code to different CUDA architectures, host side CUDA code, and code not compiled by nvccat all. The nvccdocumentation is a bit terse at times, but all of this is covered in the discussion on compilation trajectories.

提交回复
热议问题