Is the UNIX `time` command accurate enough for benchmarks? [closed]

问题

Let's say I wanted to benchmark two programs: foo.py and bar.py.

Are a couple thousand runs and the respective averages of time python foo.py and time python bar.py adequate enough for profiling and comparing their speed?

Edit: Additionally, if the execution of each program was sub-second (assume it wasn't for the above), would time still be okay to use?

回答1:

time produces good enough times for benchmarks that run over one second otherwise the time it took exec()ing a process may be large compared to its run-time.

However, when benchmarking you should watch out for context switching. That is, another process may be using CPU thus contending for CPU with your benchmark and increasing its run time. To avoid contention with other processes you should run a benchmark like this:

sudo chrt -f 99 /usr/bin/time --verbose <benchmark>

sudo chrt -f 99 perf stat -ddd <benchmark>

sudo chrt -f 99 runs your benchmark in FIFO real-time class with priority 99, which makes your process the top priority process and avoids context switching (you can change your /etc/security/limits.conf so that it doesn't require a privileged process to use real-time priorities).

It also makes time report all the available stats, including the number of context switches your benchmark incurred, which should normally be 0, otherwise you may like to rerun the benchmark.

perf stat -ddd is even more informative than /usr/bin/time and displays such information as instructions-per-cycle, branch and cache misses, etc.

And it is better to disable the CPU frequency scaling and boost, so that the CPU frequency stays constant during the benchmark to get consistent results.

回答2:

Nowadays, imo, there is no reason to use time for benchmarking purposes. Use perf stat instead. It gives you much more useful information and can repeat the benchmarking process any given number of time and do statistics on the results, i.e. calculate variance and mean value. This is much more reliable and just as simple to use as time:

perf stat -r 10 -d <your app and arguments>

The -r 10 will run your app 10 times and do statistics over it. -d outputs some more data, such as cache misses.

So while time might be reliable enough for long-running applications, it definitely is not as reliable as perf stat. Use that instead.

Addendum: If you really want to keep using time, at least don't use the bash-builtin command, but the real-deal in verbose mode:

/usr/bin/time -v <some command with arguments>

The output is then e.g.:

    Command being timed: "ls"
    User time (seconds): 0.00
    System time (seconds): 0.00
    Percent of CPU this job got: 0%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 1968
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 93
    Voluntary context switches: 1
    Involuntary context switches: 2
    Swaps: 0
    File system inputs: 8
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

Especially note how this is capable of measuring the peak RSS, which is often enough if you want to compare the effect of a patch on the peak memory consumption. I.e. use that value to compare before/after and if there is a significant decrease in the RSS peak, then you did something right.

回答3:

Yes, time is accurate enough. And you'll need to run only a dozen of times your programs (provided the run lasts more than a second, or a significant fraction of a second - ie more than 200 milliseconds at least). Of course, the file system would be hot (i.e. small files would already be cached in RAM) for most runs (except the first), so take that into account.

^{the reason you want to have the time-d run to last a few tenths of seconds at least is the accuracy and granularity of the time measurement. Don't expect less than hundredth of second of accuracy. (you need some special kernel option to have it one millisecond)}

From inside the application, you could use clock, clock_gettime, gettimeofday, getrusage, times (they surely have a Python equivalent).

Don't forget to read the time(7) man page.

回答4:

Yes. The time command gives both elapsed time as well as consumed CPU. The latter is probably what you should focus on, unless you're doing a lot of I/O. If elapsed time is important, make sure the system doesn't have other significant activity while running your test.

来源：https://stackoverflow.com/questions/9006596/is-the-unix-time-command-accurate-enough-for-benchmarks

标签

Linux

unix

profiling

benchmarking