How to benchmark in Qemu i386 system using rdtsc

自作多情 提交于 2020-05-26 09:46:59

问题


Currently I am trying to measure number of clock cycles taken to complete an operation by two different programming languages on same environment. (without using an OS)

Currently I am using Qemu-i386 emulator and using rdtsc to measure the clock cycles.

/* Return the number of CPU ticks since boot. */
static inline u64 rdtsc(void)
{
    u32 hi, lo;
    // asm("cpuid");
    asm("rdtsc" : "=a" (lo), "=d" (hi));
    return ((u64) lo) | (((u64) hi) << 32);
}

Taking the difference between rdtsc before and after operation should provide the number of clock cycles.

    start_time = rdtsc();
    operation();
    stop_time = rdtsc();
    num_cycles = stop_time-start_time;

But the difference is not constant even when I take over 100s of iterations and varies by few thousands of cycles.

  • Is there any better way of measuring clock cycles?

  • Also is there any way of providing frequency as an input parameter in Qemu? Currently I am using

qemu-system-i386 -kernel out.elf


回答1:


Trying to benchmark guest software under QEMU emulation is at best extremely difficult. QEMU's emulation does not have performance characteristics that are anything like a real hardware CPU's: some operations that are fast on hardware, like floating point, are very slow on QEMU; we don't model caches and you won't see anything like the performance curves you would see as data sets reach cache line or L1/L2/etc cache size limits; and so on.

Important factors in performance on a modern CPU include (at least):

  • raw instruction counts executed
  • TLB misses
  • branch predictor misses
  • cache misses

QEMU doesn't track any of the last three and only makes a vague attempt at the first one if you use the -icount option. (In particular, without -icount the RDTSC value we provide to the guest under emulation is more-or-less just the host CPU RDTSC value, so times measured with it will include all sorts of QEMU overhead including time spent translating guest code.)

Assuming you're on an x86 host, you could try the -enable-kvm option to run this under a KVM virtual machine. Then at least you'll be looking at the real performance of a hardware CPU, though you will still see some noise from the overhead as other host processes contend for CPU with the VM.



来源:https://stackoverflow.com/questions/33925699/how-to-benchmark-in-qemu-i386-system-using-rdtsc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!