CUDA streams not overlapping

后端 未结 2 1200
-上瘾入骨i
-上瘾入骨i 2021-02-20 16:16

I have something very similar to the code:

int k, no_streams = 4;
cudaStream_t stream[no_streams];
for(k = 0; k < no_streams; k++) cudaStreamCreate(&strea         


        
相关标签:
2条回答
  • 2021-02-20 17:09

    If you want to see the kernels overlap with kernels (concurrent kernels) you need to make use of CUDA Visual profiler 5.0 that comes with CUDA 5.0 Toolkit. I don't think previous profilers are capable of this. It should also show kernel and memcpy overlap.

    0 讨论(0)
  • 2021-02-20 17:12

    According to this post on the NVIDIA forums, the profiler will serialize streaming to get accurate timing data. If you think your timings are off, make sure you're using CUDA events...

    I've been experimenting with streaming lately, and I found the "simpleMultiCopy" example from the SDK to be really helpful, particularly with the appropriate logic and synchronizations.

    0 讨论(0)
提交回复
热议问题