how to compile cuda kernel without optimizing at all?

隐身守侯 提交于 2019-12-12 14:44:12

问题


If i compile this

__global__ void dummy_kernel(float *a, int N, float* b, int N2){
    unsigned int i = blockIdx.y*blockDim.y + threadIdx.y;
    unsigned int j = blockIdx.x*blockDim.x + threadIdx.x; 
}

i get this empty ptx code

.entry _Z9dummy_kernelPfiS_i(
.param .u64 _Z9dummy_kernelPfiS_i_param_0,
.param .u32 _Z9dummy_kernelPfiS_i_param_1,
.param .u64 _Z9dummy_kernelPfiS_i_param_2,
.param .u32 _Z9dummy_kernelPfiS_i_param_3
)
{

ret; 
}

Is there a way to force the compiler to generate ptx without optimizing at all?


回答1:


Try -g -G switches And see what it puts out I'm not sure that will cover all possible optimizations



来源:https://stackoverflow.com/questions/12883377/how-to-compile-cuda-kernel-without-optimizing-at-all

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!