发表新帖

发表新帖

How to measure the execution time of every block when using CUDA?

后端未结

关注

 3  785

遥遥无期 2020-12-11 07:35

clock() is not accurate enough.

3条回答

暗喜 (楼主)

2020-12-11 08:09
How about using clock() function in every CUDA thread to calculate start and end times. And store it in a array such a way that you can figure out which thread start/stop at which time based on array indices like following:
```
__global__ void kclock(unsigned int *ts) {
    unsigned int start_time = 0, stop_time = 0;

    start_time = clock();

    // Code we need to measure should go here.

    stop_time = clock();

    ts[(blockIdx.x * blockDim.x + threadIdx.x) * 2] = start_time;
    ts[(blockIdx.x * blockDim.x + threadIdx.x) * 2 + 1] = stop_time; 
} 
```
Then use this array to figure out minimal start time and maximum stop time for block you are considering. For example you can calculate range of indices of time array which corresponds to the (0, 0) block in CUDA and use min/max to calculate the execution time.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题