问题
I'm using CUDA 8.0 on a Xeon-based system with a GTX Titan X (GM 200). It works fine, but - I get long overheads compared to my weak GTX 600 series card at home. Specifically, when I timeline I find that a call to cudaGetCacheConfig()
is consistently taking the CUDA runtime API an incredible amount of time: 530-560 msec, or over 0.5 seconds. This, while other calls don't take as much. For example, cuDeviceGetTotalMem
takes 0.7 msec (also quite a bit of time, but an order of magnitude less), and cuDeviceGetAttribute
(which is probably limited to host-side code only) takes 0.031 msec.
Why is this happening? Or rather - how could that be possible? And can I do anything to ameliorate this situation?
Notes:
- The
cudaGetCacheConfig()
gets called aftercudaGetDeviceCount()
, but probably (not 100% certain) not before any other runtime API calls. - If I prepend a
cudaGetDeviceProperties()
call before thecudaGetCacheConfig()
call, the former takes ~0.6 msec and the latter still takes over 0.5 sec (581 msec in my last measurement).
回答1:
TL;DR: CUDA lazy initialization (as @RobertCrovella suggests).
@RobertCrovella explains in the dupe bug:
CUDA initialization usually includes establishment of UVM, which involves harmonizing of device and host memory maps. If your server has more system memory than your PC, it is one possible explanation for the disparity in initialization time. The OS may have an effect as well, finally the memory size of the GPU may have an effect.
the machine on which I get this behavior has 256 GB of memory, 32 times more than my home machine; and the GPU itself has 12 GB, 4 times more than the GPU on my home machine. This means I can - unfortunately - expect much longer initialization of the CUDA driver and/or runtime API than on my home machine. Some or all of this initialization is performed in a lazy fashion, which in my case happens to be when cudaGetCacheConfig()
is called; I suppose the other calls only require some of the initialization (not clear why, though).
来源:https://stackoverflow.com/questions/42162337/cudagetcacheconfig-takes-0-5-seconds-how-why