Including C standard headers in CUDA NVRTC code

前端 未结 1 1277
名媛妹妹
名媛妹妹 2020-12-21 01:49

I\'m writing a CUDA kernel that is compiled at runtime using NVRTC (CUDA version 9.2 with NVRTC version 7.5), which needs the stdint.h header, in order to have

相关标签:
1条回答
  • 2020-12-21 02:29

    [Preface: this is a very hacky answer, and is specific to the GNU toolchain (although I suspect the problem in the question is also specific to the GNU toolchain)].

    It would appear that the problem here is with the GNU standard header features.h which gets pulled into stdint.hand which winds up defining a lot of stub functions which have the default __host__ compilation space and cause nvrtc to blow up. It also seems that the -default-device option will result in a resolved glibC compiler feature set which makes the whole nvrtc compiler fail.

    You can defeat this (in a very hacky way) by predefining a feature set for the standard library which excludes all the host functions. Changing your JIT kernel code to

    const char program_source[] = R"%%%(
    #define __ASSEMBLER__
    #define __extension__
    #include <stdint.h>
    extern "C" __global__ void f(int32_t* in, int32_t* out) {
        out[threadIdx.x] = in[threadIdx.x];
    }
    )%%%";
    

    got me this:

    $ nvcc -std=c++14 -ccbin=g++-7 jit_header.cu -o jitheader -lnvrtc -lcuda
    $ ./jitheader 
    PTX code:
    //
    // Generated by NVIDIA NVVM Compiler
    //
    // Compiler Build ID: CL-24330188
    // Cuda compilation tools, release 9.2, V9.2.148
    // Based on LLVM 3.4svn
    //
    
    .version 6.2
    .target sm_30
    .address_size 64
    
        // .globl   f
    
    .visible .entry f(
        .param .u64 f_param_0,
        .param .u64 f_param_1
    )
    {
        .reg .b32   %r<3>;
        .reg .b64   %rd<8>;
    
    
        ld.param.u64    %rd1, [f_param_0];
        ld.param.u64    %rd2, [f_param_1];
        cvta.to.global.u64  %rd3, %rd2;
        cvta.to.global.u64  %rd4, %rd1;
        mov.u32     %r1, %tid.x;
        mul.wide.u32    %rd5, %r1, 4;
        add.s64     %rd6, %rd4, %rd5;
        ld.global.u32   %r2, [%rd6];
        add.s64     %rd7, %rd3, %rd5;
        st.global.u32   [%rd7], %r2;
        ret;
    }
    

    Big caveat: This worked on the glibC system I tried it on. It probably won't work with other toolchains or libC implementations (if, indeed, they have this problem).

    0 讨论(0)
提交回复
热议问题