问题
I am trying to familiarize myself with perf
and run it against various programs I wrote.
When I launch it against program that is 100% single threaded, perf shows that it takes two cores on machine (task-clock event). Here's the example output:
perf stat -a --per-core python3 test.py
Performance counter stats for 'system wide':
S0-C0 1 19004.951263 task-clock (msec) # 1.000 CPUs utilized (100.00%)
S0-C0 1 5,582 context-switches (100.00%)
S0-C0 1 19 cpu-migrations (100.00%)
S0-C0 1 3,746 page-faults
S0-C0 1 <not supported> cycles
S0-C0 1 <not supported> stalled-cycles-frontend
S0-C0 1 <not supported> stalled-cycles-backend
S0-C0 1 <not supported> instructions
S0-C0 1 <not supported> branches
S0-C0 1 <not supported> branch-misses
S0-C1 1 19004.950059 task-clock (msec) # 1.000 CPUs utilized (100.00%)
S0-C1 1 6,752 context-switches (100.00%)
S0-C1 1 25 cpu-migrations (100.00%)
S0-C1 1 935 page-faults
S0-C1 1 <not supported> cycles
S0-C1 1 <not supported> stalled-cycles-frontend
S0-C1 1 <not supported> stalled-cycles-backend
S0-C1 1 <not supported> instructions
S0-C1 1 <not supported> branches
S0-C1 1 <not supported> branch-misses
19.004688019 seconds time elapsed
It even shows that simple sleep
command takes two cores on my computer and I can't explain this. I understand that OS scheduler can reassign active core for any process, but in this case CPU utilization would reflect that.
Can anyone explain this?
Thanks!
回答1:
According to man page of perf stat
subocmmand, you have -a
option to profile full system:
http://man7.org/linux/man-pages/man1/perf-stat.1.html
-a, --all-cpus
system-wide collection from all CPUs (default if no target is
specified)
In this "system-wide" mode perf stat
(and perf record too) will count events on (or profile for record
) all CPUs in the system. When used without additional argument of command
, perf will run until interrupted by Ctrl-C. With argument of command
, perf will count/profile until the command works. Typical usage is
perf stat -a sleep 10 # Profile counting every CPU for 10 seconds
perf record -a sleep 10 # Profile with cycles every CPU for 10 seconds to perf.data
For getting stats of single command use single process profiling (without -a option)
perf stat python3 test.py
For profiling (perf record
) you may run without -a option; or you may use -a and later do some manual filtering in perf report, focusing only on the pids/tids/dsos of your application (This can be very useful if command to profile uses some interprocess requests to other daemons to do lot of CPU work).
--per-core, -A, -C <cpulist>, --per-socket
options are only for system-wide -a
mode. Try --per-thread
with -p pid
attach to process option.
来源:https://stackoverflow.com/questions/43894497/why-does-perf-show-that-sleep-takes-all-cores