Why does perf show that sleep takes all cores?

吃可爱长大的小学妹 提交于 2019-12-12 04:08:21

问题


I am trying to familiarize myself with perf and run it against various programs I wrote.

When I launch it against program that is 100% single threaded, perf shows that it takes two cores on machine (task-clock event). Here's the example output:

perf stat  -a --per-core python3 test.py

Performance counter stats for 'system wide':

    S0-C0           1       19004.951263      task-clock (msec) # 1.000 CPUs utilized            (100.00%)
    S0-C0           1              5,582      context-switches                                              (100.00%)
    S0-C0           1                 19      cpu-migrations                                                (100.00%)
    S0-C0           1              3,746      page-faults                                                 
    S0-C0           1    <not supported>      cycles                   
    S0-C0           1    <not supported>      stalled-cycles-frontend  
    S0-C0           1    <not supported>      stalled-cycles-backend   
    S0-C0           1    <not supported>      instructions             
    S0-C0           1    <not supported>      branches                 
    S0-C0           1    <not supported>      branch-misses            
    S0-C1           1       19004.950059      task-clock (msec) # 1.000 CPUs utilized            (100.00%)
    S0-C1           1              6,752      context-switches                                              (100.00%)
    S0-C1           1                 25      cpu-migrations                                                (100.00%)
    S0-C1           1                935      page-faults                                                 
    S0-C1           1    <not supported>      cycles                   
    S0-C1           1    <not supported>      stalled-cycles-frontend  
    S0-C1           1    <not supported>      stalled-cycles-backend   
    S0-C1           1    <not supported>      instructions             
    S0-C1           1    <not supported>      branches                 
    S0-C1           1    <not supported>      branch-misses            

      19.004688019 seconds time elapsed

It even shows that simple sleep command takes two cores on my computer and I can't explain this. I understand that OS scheduler can reassign active core for any process, but in this case CPU utilization would reflect that.

Can anyone explain this?

Thanks!


回答1:


According to man page of perf stat subocmmand, you have -a option to profile full system: http://man7.org/linux/man-pages/man1/perf-stat.1.html

   -a, --all-cpus
       system-wide collection from all CPUs (default if no target is
       specified)

In this "system-wide" mode perf stat (and perf record too) will count events on (or profile for record) all CPUs in the system. When used without additional argument of command, perf will run until interrupted by Ctrl-C. With argument of command, perf will count/profile until the command works. Typical usage is

perf stat -a sleep 10      # Profile counting every CPU for 10 seconds
perf record -a sleep 10    # Profile with cycles every CPU for 10 seconds to perf.data

For getting stats of single command use single process profiling (without -a option)

perf stat python3 test.py

For profiling (perf record) you may run without -a option; or you may use -a and later do some manual filtering in perf report, focusing only on the pids/tids/dsos of your application (This can be very useful if command to profile uses some interprocess requests to other daemons to do lot of CPU work).

--per-core, -A, -C <cpulist>, --per-socket options are only for system-wide -a mode. Try --per-thread with -p pid attach to process option.



来源:https://stackoverflow.com/questions/43894497/why-does-perf-show-that-sleep-takes-all-cores

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!