cudaGetCacheConfig takes 0.5 seconds - how/why? [duplicate]
问题 This question already has answers here : slowness of first cudaMalloc (K40 vs K20), even after cudaSetDevice (2 answers) Closed 2 years ago . I'm using CUDA 8.0 on a Xeon-based system with a GTX Titan X (GM 200). It works fine, but - I get long overheads compared to my weak GTX 600 series card at home. Specifically, when I timeline I find that a call to cudaGetCacheConfig() is consistently taking the CUDA runtime API an incredible amount of time: 530-560 msec, or over 0.5 seconds. This, while