Timer function to provide time in nano seconds using C++

前端未结

关注

 16  2307

野趣味 2020-11-22 06:02

I wish to calculate the time it took for an API to return a value. The time taken for such an action is in the space of nano seconds. As the API is a C++ class/function, I a

16条回答

借酒劲吻你 (楼主)

2020-11-22 06:24
This new answer uses C++11's facility. While there are other answers that show how to use , none of them shows how to use with the RDTSC facility mentioned in several of the other answers here. So I thought I would show how to use RDTSC with . Additionally I'll demonstrate how you can templatize the testing code on the clock so that you can rapidly switch between RDTSC and your system's built-in clock facilities (which will likely be based on clock(), clock_gettime() and/or QueryPerformanceCounter.

Note that the RDTSC instruction is x86-specific. QueryPerformanceCounter is Windows only. And clock_gettime() is POSIX only. Below I introduce two new clocks: std::chrono::high_resolution_clock and std::chrono::system_clock, which, if you can assume C++11, are now cross-platform.

First, here is how you create a C++11-compatible clock out of the Intel rdtsc assembly instruction. I'll call it x::clock:
```
#include 

namespace x
{

struct clock
{
    typedef unsigned long long                 rep;
    typedef std::ratio<1, 2'800'000'000>       period; // My machine is 2.8 GHz
    typedef std::chrono::duration duration;
    typedef std::chrono::time_point     time_point;
    static const bool is_steady =              true;

    static time_point now() noexcept
    {
        unsigned lo, hi;
        asm volatile("rdtsc" : "=a" (lo), "=d" (hi));
        return time_point(duration(static_cast(hi) << 32 | lo));
    }
};

}  // x
```
All this clock does is count CPU cycles and store it in an unsigned 64-bit integer. You may need to tweak the assembly language syntax for your compiler. Or your compiler may offer an intrinsic you can use instead (e.g. now() {return __rdtsc();}).

To build a clock you have to give it the representation (storage type). You must also supply the clock period, which must be a compile time constant, even though your machine may change clock speed in different power modes. And from those you can easily define your clock's "native" time duration and time point in terms of these fundamentals.

If all you want to do is output the number of clock ticks, it doesn't really matter what number you give for the clock period. This constant only comes into play if you want to convert the number of clock ticks into some real-time unit such as nanoseconds. And in that case, the more accurate you are able to supply the clock speed, the more accurate will be the conversion to nanoseconds, (milliseconds, whatever).

Below is example code which shows how to use x::clock. Actually I've templated the code on the clock as I'd like to show how you can use many different clocks with the exact same syntax. This particular test is showing what the looping overhead is when running what you want to time under a loop:
```
#include 

template 
void
test_empty_loop()
{
    // Define real time units
    typedef std::chrono::duration picoseconds;
    // or:
    // typedef std::chrono::nanoseconds nanoseconds;
    // Define double-based unit of clock tick
    typedef std::chrono::duration Cycle;
    using std::chrono::duration_cast;
    const int N = 100000000;
    // Do it
    auto t0 = clock::now();
    for (int j = 0; j < N; ++j)
        asm volatile("");
    auto t1 = clock::now();
    // Get the clock ticks per iteration
    auto ticks_per_iter = Cycle(t1-t0)/N;
    std::cout << ticks_per_iter.count() << " clock ticks per iteration\n";
    // Convert to real time units
    std::cout << duration_cast(ticks_per_iter).count()
              << "ps per iteration\n";
}
```
The first thing this code does is create a "real time" unit to display the results in. I've chosen picoseconds, but you can choose any units you like, either integral or floating point based. As an example there is a pre-made std::chrono::nanoseconds unit I could have used.

As another example I want to print out the average number of clock cycles per iteration as a floating point, so I create another duration, based on double, that has the same units as the clock's tick does (called Cycle in the code).

The loop is timed with calls to clock::now() on either side. If you want to name the type returned from this function it is:
```
typename clock::time_point t0 = clock::now();
```
(as clearly shown in the x::clock example, and is also true of the system-supplied clocks).

To get a duration in terms of floating point clock ticks one merely subtracts the two time points, and to get the per iteration value, divide that duration by the number of iterations.

You can get the count in any duration by using the count() member function. This returns the internal representation. Finally I use std::chrono::duration_cast to convert the duration Cycle to the duration picoseconds and print that out.

To use this code is simple:
```
int main()
{
    std::cout << "\nUsing rdtsc:\n";
    test_empty_loop();

    std::cout << "\nUsing std::chrono::high_resolution_clock:\n";
    test_empty_loop();

    std::cout << "\nUsing std::chrono::system_clock:\n";
    test_empty_loop();
}
```
Above I exercise the test using our home-made x::clock, and compare those results with using two of the system-supplied clocks: std::chrono::high_resolution_clock and std::chrono::system_clock. For me this prints out:
```
Using rdtsc:
1.72632 clock ticks per iteration
616ps per iteration

Using std::chrono::high_resolution_clock:
0.620105 clock ticks per iteration
620ps per iteration

Using std::chrono::system_clock:
0.00062457 clock ticks per iteration
624ps per iteration
```
This shows that each of these clocks has a different tick period, as the ticks per iteration is vastly different for each clock. However when converted to a known unit of time (e.g. picoseconds), I get approximately the same result for each clock (your mileage may vary).

Note how my code is completely free of "magic conversion constants". Indeed, there are only two magic numbers in the entire example:
1. The clock speed of my machine in order to define x::clock.
2. The number of iterations to test over. If changing this number makes your results vary greatly, then you should probably make the number of iterations higher, or empty your computer of competing processes while testing.
0 讨论(0)

查看其它16个回答
发布评论:

提交评论
- 加载中...