How to measure the execution time of every block when using CUDA?

后端 未结 3 785
遥遥无期
遥遥无期 2020-12-11 07:35

clock() is not accurate enough.

3条回答
  •  暗喜
    暗喜 (楼主)
    2020-12-11 08:09

    How about using clock() function in every CUDA thread to calculate start and end times. And store it in a array such a way that you can figure out which thread start/stop at which time based on array indices like following:

    __global__ void kclock(unsigned int *ts) {
        unsigned int start_time = 0, stop_time = 0;
    
        start_time = clock();
    
        // Code we need to measure should go here.
    
        stop_time = clock();
    
        ts[(blockIdx.x * blockDim.x + threadIdx.x) * 2] = start_time;
        ts[(blockIdx.x * blockDim.x + threadIdx.x) * 2 + 1] = stop_time; 
    } 
    

    Then use this array to figure out minimal start time and maximum stop time for block you are considering. For example you can calculate range of indices of time array which corresponds to the (0, 0) block in CUDA and use min/max to calculate the execution time.

提交回复
热议问题