I am using rdtsc and cpuid instructions (using volatile inline assembly instructions) to measure the CPU cycles of a program. The rdtsc instruction gives realistic results for m
I don't know if it is(was) correct, but the code I once used was:
#define rdtscll(val) \
__asm__ __volatile__("rdtsc" : "=A" (val))
typedef unsigned unsigned long long Ull;
static inline Ull myget_cycles (void)
{
Ull ret;
rdtscll(ret);
return ret;
}
I remember it was "slower" on Intel than on AMD. YMMV.
In order to prevent an inline rdtsc
function from being moved across any loads/stores/other operations, you should both write the asm as __asm__ __volatile__
and include "memory"
in the clobber list. Without doing the latter, GCC is prevented from removing the asm or moving it across any instructions that could need the results (or change the inputs) of the asm, but it could still move it with respect to unrelated operations. The "memory"
clobber means that GCC cannot make any assumptions about memory contents (any variable whose address has been potentially leaked) remaining the same across the asm, and thus it becomes much more difficult to move it. However, GCC may still be able to move the asm across instructions that only modify local variables whose address was never taken (since they are not "memory"
).
Oh, and as wildplasser said in a comment, check the asm output before you waste a lot of time on this.