RDTSCP versus RDTSC + CPUID

后端 未结 4 1164
忘掉有多难
忘掉有多难 2020-12-24 09:35

I\'m doing some Linux Kernel timings, specifically in the Interrupt Handling path. I\'ve been using RDTSC for timings, however I recently learned it\'s not necessarily accur

4条回答
  •  被撕碎了的回忆
    2020-12-24 10:07

    Is RDTSCP truly accurate as a point of measurement, and is it the "correct" way of doing the timing?

    Modern x86 CPUs can dynamically adjust the frequency to save power by under clocking (e.g. Intel's SpeedStep) and to boost performance for heavy load by over-clocking (e.g. Intel's Turbo Boost). The time stamp counter on these modern processors however counts at a constant rate (e.g. look for "constant_tsc" flag in Linux's /proc/cpuinfo).

    So the answer to your question depends on what you really want to know. Unless, dynamic frequency scaling is disabled (e.g. in the BIOS) the time stamp counter can no longer be relied on to determine the number of cycles that have elapsed. However, the time stamp counter can still be relied on to determine the time that has elapsed (with some care - but I use clock_gettime in C - see the end of my answer).

    To benchmark my matrix multiplication code and compare it to the theoretical best I need to know both the time elapsed and the cycles elapsed (or rather the effective frequency during the test).

    Let me present three different methods to determine the number of cycles elapsed.

    1. Disable dynamic frequency scaling in the BIOS and use the time stamp counter.
    2. For Intel processors request the core clock cycles from the performance monitor counter.
    3. Measure the frequency under load.

    The first method is the most reliable but it requires access to BIOS and affects the performance of everything else you run (when I disable dynamic frequency scaling on my i5-4250U it runs at a constant 1.3 GHz instead of a base of 2.6 GHz). It's also inconvenient to change the BIOS only for benchmarking.

    The second method is useful when you don't want to disable dynamic frequency scale and/or for systems you don't have physical access to. However, the performance monitor counters require privileged instructions which only the kernel or device drivers have access to.

    The third method is useful on systems where you don't have physical access and do not have privileged access. This is the method I use most in practice. It's in principle the least reliable but in practice it's been as reliable as the second method.

    Here is how I determine the time elapsed (in seconds) with C.

    #define TIMER_TYPE CLOCK_REALTIME
    
    timespec time1, time2;
    clock_gettime(TIMER_TYPE, &time1);
    foo();
    clock_gettime(TIMER_TYPE, &time2);
    double dtime = time_diff(time1,time2);
    
    double time_diff(timespec start, timespec end)
    {
        timespec temp;
        if ((end.tv_nsec-start.tv_nsec)<0) {
            temp.tv_sec = end.tv_sec-start.tv_sec-1;
            temp.tv_nsec = 1000000000+end.tv_nsec-start.tv_nsec;
        } else {
            temp.tv_sec = end.tv_sec-start.tv_sec;
            temp.tv_nsec = end.tv_nsec-start.tv_nsec;
        }
        return (double)temp.tv_sec +  (double)temp.tv_nsec*1E-9;
    }
    

提交回复
热议问题