I want to calculate time elapsed during a function call in C, to the precision of 1 nanosecond.
Is there a timer function available in C to do it?
If yes ple
We all waste our time recreating this test sample. Why not post something compile ready? Anyway, here is mine with results.
CLOCK_PROCESS_CPUTIME_ID resolution: 0 sec 1 nano
clock_gettime 4194304 iterations : 459.427311 msec 0.110 microsec / call
CLOCK_MONOTONIC resolution: 0 sec 1 nano
clock_gettime 4194304 iterations : 64.498347 msec 0.015 microsec / call
CLOCK_REALTIME resolution: 0 sec 1 nano
clock_gettime 4194304 iterations : 65.494828 msec 0.016 microsec / call
CLOCK_THREAD_CPUTIME_ID resolution: 0 sec 1 nano
clock_gettime 4194304 iterations : 427.133157 msec 0.102 microsec / call
rdtsc 4194304 iterations : 115.427895 msec 0.028 microsec / call
Dummy 16110479703957395943
rdtsc in milliseconds 4194304 iterations : 197.259866 msec 0.047 microsec / call
Dummy 4.84682e+08 UltraHRTimerMs 197 HRTimerMs 197.26
#include <time.h>
#include <cstdio>
#include <string>
#include <iostream>
#include <chrono>
#include <thread>
enum { TESTRUNS = 1024*1024*4 };
class HRCounter
{
private:
timespec start, tmp;
public:
HRCounter(bool init = true)
{
if(init)
SetStart();
}
void SetStart()
{
clock_gettime(CLOCK_MONOTONIC, &start);
}
double GetElapsedMs()
{
clock_gettime(CLOCK_MONOTONIC, &tmp);
return (double)(tmp.tv_nsec - start.tv_nsec) / 1000000 + (tmp.tv_sec - start.tv_sec) * 1000;
}
};
__inline__ uint64_t rdtsc(void) {
uint32_t lo, hi;
__asm__ __volatile__ ( // serialize
"xorl %%eax,%%eax \n cpuid"
::: "%rax", "%rbx", "%rcx", "%rdx");
/* We cannot use "=A", since this would use %rax on x86_64 and return only the lower 32bits of the TSC */
__asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
return (uint64_t)hi << 32 | lo;
}
inline uint64_t GetCyclesPerMillisecondImpl()
{
uint64_t start_cyles = rdtsc();
HRCounter counter;
std::this_thread::sleep_for (std::chrono::seconds(3));
uint64_t end_cyles = rdtsc();
double elapsed_ms = counter.GetElapsedMs();
return (end_cyles - start_cyles) / elapsed_ms;
}
inline uint64_t GetCyclesPerMillisecond()
{
static uint64_t cycles_in_millisecond = GetCyclesPerMillisecondImpl();
return cycles_in_millisecond;
}
class UltraHRCounter
{
private:
uint64_t start_cyles;
public:
UltraHRCounter(bool init = true)
{
GetCyclesPerMillisecond();
if(init)
SetStart();
}
void SetStart() { start_cyles = rdtsc(); }
double GetElapsedMs()
{
uint64_t end_cyles = rdtsc();
return (end_cyles - start_cyles) / GetCyclesPerMillisecond();
}
};
int main()
{
auto Run = [](std::string const& clock_name, clockid_t clock_id)
{
HRCounter counter(false);
timespec spec;
clock_getres( clock_id, &spec );
printf("%s resolution: %ld sec %ld nano\n", clock_name.c_str(), spec.tv_sec, spec.tv_nsec );
counter.SetStart();
for ( int i = 0 ; i < TESTRUNS ; ++ i )
{
clock_gettime( clock_id, &spec );
}
double fb = counter.GetElapsedMs();
printf( "clock_gettime %d iterations : %.6f msec %.3f microsec / call\n", TESTRUNS, ( fb ), (( fb ) * 1000) / TESTRUNS );
};
Run("CLOCK_PROCESS_CPUTIME_ID",CLOCK_PROCESS_CPUTIME_ID);
Run("CLOCK_MONOTONIC",CLOCK_MONOTONIC);
Run("CLOCK_REALTIME",CLOCK_REALTIME);
Run("CLOCK_THREAD_CPUTIME_ID",CLOCK_THREAD_CPUTIME_ID);
{
HRCounter counter(false);
uint64_t dummy;
counter.SetStart();
for ( int i = 0 ; i < TESTRUNS ; ++ i )
{
dummy += rdtsc();
}
double fb = counter.GetElapsedMs();
printf( "rdtsc %d iterations : %.6f msec %.3f microsec / call\n", TESTRUNS, ( fb ), (( fb ) * 1000) / TESTRUNS );
std::cout << "Dummy " << dummy << std::endl;
}
{
double dummy;
UltraHRCounter ultra_hr_counter;
HRCounter counter;
for ( int i = 0 ; i < TESTRUNS ; ++ i )
{
dummy += ultra_hr_counter.GetElapsedMs();
}
double fb = counter.GetElapsedMs();
double final = ultra_hr_counter.GetElapsedMs();
printf( "rdtsc in milliseconds %d iterations : %.6f msec %.3f microsec / call\n", TESTRUNS, ( fb ), (( fb ) * 1000) / TESTRUNS );
std::cout << "Dummy " << dummy << " UltraHRTimerMs " << final << " HRTimerMs " << fb << std::endl;
}
return 0;
}
There is no timer in c which has guaranteed 1 nanosecond precision. You may want to look into clock() or better yet The POSIX gettimeofday()
I don't know if you'll find any timers that provide resolution to a single nanosecond -- it would depend on the resolution of the system clock -- but you might want to look at http://code.google.com/p/high-resolution-timer/. They indicate they can provide resolution to the microsecond level on most Linux systems and in the nanoseconds on Sun systems.
You are asking for something that is not possible this way. You would need HW level support to get to that level of precision and even then control the variables very carefully. What happens if you get an interrupt while running your code? What if the OS decides to run some other piece of code?
And what does your code do? Does it use RAM memory? What if your code and/or data is or is not in the cache?
In some environments you can use HW level counters for this job provided you control those variables. But how do you prevent context switches in Linux?
For instance, in Texas Instruments' DSP tools (Code Composer Studio) you can profile the code very exactly because the whole debugging environment is set such that the emulator (e.g. Blackhawk) receives info about every operation run. You can also set watchpoints which are coded directly into a HW block inside the chip in some processors. This works because the memory lanes are also routed to this debugging block.
They do offer functions in their CSL's (Chip Support Library) which are what you are asking for with the timing overhead being a few cycles. But this is only available for their processors and is completely dependant on reading the timer values from the HW registers.
Making benchmarks on this scale is not a good idea. You have overhead for getting the time at the least, which can render your results unreliable if you work on nanoseconds. You can either use your platforms system calls or boost::Date_Time on a larger scale [preferred].
On Intel and compatible processors you can use rdtsc instruction which can be wrapped into an asm() block of C code easily. It returns the value of a built-in processor cycle counter that increments on each cycle. You gain high resolution and such timing is extremely fast.
To find how fast this increments you'll need to calibrate - call this instruction twice over a fixed time period like five seconds. If you do this on a processor that shifts frequency to lower power consumption you may have problems calibrating.