Including C standard headers in CUDA NVRTC code

前端未结

关注

 1  1277

I\'m writing a CUDA kernel that is compiled at runtime using NVRTC (CUDA version 9.2 with NVRTC version 7.5), which needs the stdint.h header, in order to have

相关标签:

1条回答

没有蜡笔的小新

2020-12-21 02:29

[Preface: this is a very hacky answer, and is specific to the GNU toolchain (although I suspect the problem in the question is also specific to the GNU toolchain)].

It would appear that the problem here is with the GNU standard header features.h which gets pulled into stdint.hand which winds up defining a lot of stub functions which have the default __host__ compilation space and cause nvrtc to blow up. It also seems that the -default-device option will result in a resolved glibC compiler feature set which makes the whole nvrtc compiler fail.

You can defeat this (in a very hacky way) by predefining a feature set for the standard library which excludes all the host functions. Changing your JIT kernel code to

const char program_source[] = R"%%%(
#define __ASSEMBLER__
#define __extension__
#include <stdint.h>
extern "C" __global__ void f(int32_t* in, int32_t* out) {
    out[threadIdx.x] = in[threadIdx.x];
}
)%%%";

got me this:

$ nvcc -std=c++14 -ccbin=g++-7 jit_header.cu -o jitheader -lnvrtc -lcuda
$ ./jitheader 
PTX code:
//
// Generated by NVIDIA NVVM Compiler
//
// Compiler Build ID: CL-24330188
// Cuda compilation tools, release 9.2, V9.2.148
// Based on LLVM 3.4svn
//

.version 6.2
.target sm_30
.address_size 64

    // .globl   f

.visible .entry f(
    .param .u64 f_param_0,
    .param .u64 f_param_1
)
{
    .reg .b32   %r<3>;
    .reg .b64   %rd<8>;


    ld.param.u64    %rd1, [f_param_0];
    ld.param.u64    %rd2, [f_param_1];
    cvta.to.global.u64  %rd3, %rd2;
    cvta.to.global.u64  %rd4, %rd1;
    mov.u32     %r1, %tid.x;
    mul.wide.u32    %rd5, %r1, 4;
    add.s64     %rd6, %rd4, %rd5;
    ld.global.u32   %r2, [%rd6];
    add.s64     %rd7, %rd3, %rd5;
    st.global.u32   [%rd7], %r2;
    ret;
}

Big caveat: This worked on the glibC system I tried it on. It probably won't work with other toolchains or libC implementations (if, indeed, they have this problem).

0 讨论(0)