I\'d like to measure the time a bit of code within my kernel takes. I\'ve followed this question along with its comments so that my kernel looks something l
clock64 returns a value in graphics clock cycles. The graphics clock is dynamic so I would not recommend using a constant to try to convert to seconds. If you want to convert to wall time then the better option is to use globaltimer a 64-bit clock in nanoseconds.
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#special-registers-globaltimer
asm volatile("mov.u64 %0, %%globaltimer;" : "=l"(start));
The unit is in nanoseconds.
The default resolution is 32ns with update every µs. The NVIDIA performance tools force the update to every 32 ns (or 31.25 MHz). This clock is used by CUPTI for start time when capturing concurrent kernel trace.