CUDA device to host copy very slow

后端 未结 2 1708
太阳男子
太阳男子 2021-01-03 08:17

I\'m running windows 7 64 bits, cuda 4.2, visual studio 2010.

First, I run some code on cuda, then download the data back to host. Then do some processing and move b

2条回答
  •  Happy的楠姐
    2021-01-03 08:42

    The problem is one of timing, not of any change in copy performance. Kernel launches are asynchronous in CUDA, so what you are measuring is not just the time for thrust::copy but also for the prior kernel you launched to complete. If you change you code for timing the copy operation to something like this:

    cudaDeviceSynchronize(); // wait until prior kernel is finished
    start=clock();
    thrust::copy(d_b.begin(), d_b.end(), h_a.begin());
    end=clock();
    cout<<"Time Spent:"<

    You should find the transfer times are restored to their previous performance. So you real question isn't "why is thrust::copy slow", it is "why is my kernel slow". And based on the rather terrible pseudo code you posted, the answer is "because it is full of atomicExch() calls which serialise kernel memory transactions".

提交回复
热议问题