RDTSCP versus RDTSC + CPUID

后端 未结 4 1159
忘掉有多难
忘掉有多难 2020-12-24 09:35

I\'m doing some Linux Kernel timings, specifically in the Interrupt Handling path. I\'ve been using RDTSC for timings, however I recently learned it\'s not necessarily accur

4条回答
  •  夕颜
    夕颜 (楼主)
    2020-12-24 10:05

    The 2010 Intel paper How to Benchmark Code Execution Times on Intel ® IA-32 and IA-64 Instruction Set Architectures can be considered as outdated when it comes to its recommendations to combine RDTSC/RDTSCP with CPUID.

    Current Intel reference documentation recommends fencing instructions as more efficient alternatives to CPUID:

    Note that the SFENCE, LFENCE, and MFENCE instructions provide a more efficient method of controlling memory ordering than the CPUID instruction.

    (Intel® 64 and IA-32 Architectures Software Developer’s Manual: Volume 3, Section 8.2.5, September 2016)

    If software requires RDTSC to be executed only after all previous instructions have executed and all previous loads and stores are globally visible, it can execute the sequence MFENCE;LFENCE immediately before RDTSC.

    (Intel RDTSC)

    Thus, to get the TSC start value you execute this instruction sequence:

    mfence
    lfence
    rdtsc
    shl     rdx, 0x20
    or      rax, rdx
    

    At the end of your benchmark, to get the TSC stop value:

    rdtscp
    lfence
    shl     rdx, 0x20
    or      rax, rdx
    

    Note that in contrast to CPUID, the lfence instruction doesn't clobber any registers, thus it isn't necessary to rescue the EDX:EAX registers before executing the serializing instruction.

    Relevant documentation snippet:

    If software requires RDTSCP to be executed prior to execution of any subsequent instruction (including any memory accesses), it can execute LFENCE immediately after RDTSCP (Intel RDTSCP)

    As an example how to integrate this into a C program, see also my GCC inline assembler implementations of the above operations.

提交回复
热议问题