CUDA: Does passing arguments to a kernel slow the kernel launch much?

寵の児 提交于 2019-12-04 07:05:56

I would expect the benefits of such an optimization to be rather small. On sane platforms (ie. anything other than WDDM), kernel launch overhead is only of the order of 10-20 microseconds, so there probably isn't a lot of scope to improve.

Having said that, if you want to try, the logical way to affect this is using constant memory. Define each argument as a __constant__ symbol at translation unit scope, then use the cudaMemcpyToSymbol function to copy values from the host to device constant memory.

Simple answer: no.

To be more elaborate: You need to send some signals from host to the GPU anyway, to launch the kernel itself. At this point, few more bytes of parameter data does not matter anymore.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!