Timing Kernel launches in CUDA while using Thrust

末鹿安然 提交于 2020-02-27 09:33:19

问题


Kernel launches in CUDA are generally asynchronous, which (as I understand) means that once the CUDA kernel is launched control returns immediately to the CPU. The CPU continues doing some useful work while the GPU is busy number crunching unless the CPU is forcefully stalled using cudaThreadsynchronize() or cudaMemcpy() .

Now I have just started using the Thrust library for CUDA. Are the function calls in Thrust synchronous or asynchronous?

In other words, if I invoke thrust::sort(D.begin(),D.end()); where D is a device vector, does it make sense to measure the sorting time using

        start = clock();//Start

             thrust::sort(D.begin(),D.end());

        diff = ( clock() - start ) / (double)CLOCKS_PER_SEC;
        std::cout << "\nDevice Time taken is: " <<diff<<std::endl;

If the function call is asynchronous then diff will be 0 seconds for any vector (which is junk for timings), but if it is synchronous I will indeed get the real time performance.


回答1:


Thrust calls which invoke kernels are asynchronous, just like the underlying CUDA APIs thrust uses. Thrust calls which copy data are synchronous, just like the underlying CUDA APIs thrust uses.

So your example would only be measuring the kernel launch and thrust host side setup overheads, not the operation itself. For timing, you can get around this by calling either cudaThreadSynchronize or cudaDeviceSynchronize (the later in CUDA 4.0 or later) after the thrust kernel launch. Alternatively, if you include a post kernel launch copy operation and record the stop time after that, your timing will include setup, execution, and copying time.

In your example this would look something like

   start = clock();//Start 

   thrust::sort(D.begin(),D.end()); 
   cudaThreadSynchronize(); // block until kernel is finished

   diff = ( clock() - start ) / (double)CLOCKS_PER_SEC; 
   std::cout << "\nDevice Time taken is: " <<diff<<std::endl; 


来源:https://stackoverflow.com/questions/8091219/timing-kernel-launches-in-cuda-while-using-thrust

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!