问题
If i compile this
__global__ void dummy_kernel(float *a, int N, float* b, int N2){
unsigned int i = blockIdx.y*blockDim.y + threadIdx.y;
unsigned int j = blockIdx.x*blockDim.x + threadIdx.x;
}
i get this empty ptx code
.entry _Z9dummy_kernelPfiS_i(
.param .u64 _Z9dummy_kernelPfiS_i_param_0,
.param .u32 _Z9dummy_kernelPfiS_i_param_1,
.param .u64 _Z9dummy_kernelPfiS_i_param_2,
.param .u32 _Z9dummy_kernelPfiS_i_param_3
)
{
ret;
}
Is there a way to force the compiler to generate ptx without optimizing at all?
回答1:
Try -g -G switches And see what it puts out I'm not sure that will cover all possible optimizations
来源:https://stackoverflow.com/questions/12883377/how-to-compile-cuda-kernel-without-optimizing-at-all