#include
static inline unsigned long long tick()
{
unsigned long long d;
__asm__ __volatile__ (\"rdtsc\" : \"=A\" (d) );
ret
There are any number of reasons to get a large number:
Note that rdtsc is not particularly reliable for timing without work, because:
Most operatings systems have a high-precision clock or timing method. clock_gettime on Linux for example, particularly the monotonic clocks. (Understand too the difference between a wall-clock and a monotonic clock: a wall clock can move backwards — even in UTC.) On Windows, I think the recommendation is QueryHighPerformanceCounter. Typically these clocks provide more than enough accuracy for most needs.
Also, looking at the assembly, it looks like you're only getting 32-bits of the answer: I don't see %edx getting saved after rdtsc.
Running your code, I get timings from 120-150 ns for clock_gettime using CLOCK_MONOTONIC, and 70-90 cycles for rdtsc (~20 ns at full speed, but I suspect the processor is clocked down, and that's really about 50 ns). (On a laptopdesktop (darn SSH, forgot which machine I was on!) that is at about a constant 20% CPU use) Sure your machine isn't bogged down?