CUDA global memory access speed

末鹿安然 提交于 2019-12-02 10:26:10

When you delete the code line:

direct_map[index] = -1; 

your kernel isn't doing anything useful. The compiler can recognize this and eliminate most of the code associated with the kernel launch. That modification to the kernel code means that the kernel no longer affects any global state and the code is effectively useless, from the compiler's perspective.

You can verify this by dumping the assembly code that the compiler generates in each case, for example with cuobjdump -sass myexecutable

Anytime you make a small change to the code and see a large change in timing, you should suspect that the change you made has allowed the compiler to make different optimization decisions.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!