Trouble measuring the elapsed time of a CUDA program and CUDA kernels
问题 I currently have three methos of measuring the elapsed time, two using CUDA events and the other recording start and end UNIX. The ones using CUDA events measure two things, one measures the entire outer loop time, and the other sum all kernel execution times. Here's the code: int64 x1, x2; cudaEvent_t start; cudaEvent_t end; cudaEvent_t s1, s2; float timeValue; #define timer_s cudaEventRecord(start, 0); #define timer_e cudaEventRecord(end, 0); cudaEventSynchronize(end); cudaEventElapsedTime(