Should I look into PTX to optimize my kernel? If so, how?
问题 Do you recommend reading your kernel's PTX code to find out to optimize your kernels further? One example: I read, that one can find out from the PTX code if the automatic loop unrolling worked. If this is not the case, one would have to unroll the loops manually in the kernel code. Are there other use-cases for the PTX code? Do you look into your PTX code? Where can I find out how to be able to read the PTX code CUDA generates for my kernels? 回答1: The first point to make about PTX is that it