cudaGetCacheConfig takes 0.5 seconds - how/why? [duplicate]

[亡魂溺海] 提交于 2019-11-28 13:41:39

问题


I'm using CUDA 8.0 on a Xeon-based system with a GTX Titan X (GM 200). It works fine, but - I get long overheads compared to my weak GTX 600 series card at home. Specifically, when I timeline I find that a call to cudaGetCacheConfig() is consistently taking the CUDA runtime API an incredible amount of time: 530-560 msec, or over 0.5 seconds. This, while other calls don't take as much. For example, cuDeviceGetTotalMem takes 0.7 msec (also quite a bit of time, but an order of magnitude less), and cuDeviceGetAttribute (which is probably limited to host-side code only) takes 0.031 msec.

Why is this happening? Or rather - how could that be possible? And can I do anything to ameliorate this situation?

Notes:

  • The cudaGetCacheConfig() gets called after cudaGetDeviceCount(), but probably (not 100% certain) not before any other runtime API calls.
  • If I prepend a cudaGetDeviceProperties() call before the cudaGetCacheConfig() call, the former takes ~0.6 msec and the latter still takes over 0.5 sec (581 msec in my last measurement).

回答1:


TL;DR: CUDA lazy initialization (as @RobertCrovella suggests).

@RobertCrovella explains in the dupe bug:

CUDA initialization usually includes establishment of UVM, which involves harmonizing of device and host memory maps. If your server has more system memory than your PC, it is one possible explanation for the disparity in initialization time. The OS may have an effect as well, finally the memory size of the GPU may have an effect.

the machine on which I get this behavior has 256 GB of memory, 32 times more than my home machine; and the GPU itself has 12 GB, 4 times more than the GPU on my home machine. This means I can - unfortunately - expect much longer initialization of the CUDA driver and/or runtime API than on my home machine. Some or all of this initialization is performed in a lazy fashion, which in my case happens to be when cudaGetCacheConfig() is called; I suppose the other calls only require some of the initialization (not clear why, though).



来源:https://stackoverflow.com/questions/42162337/cudagetcacheconfig-takes-0-5-seconds-how-why

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!