perf

How to use linux `perf` tool to generate “Off-CPU” profile

混江龙づ霸主 提交于 2019-11-30 00:36:25
Brendan D. Gregg (author of DTrace book) has interesting variant of profiling: the "Off-CPU" profiling (and Off-CPU Flame Graph ; slides 2013, p112-137 ) to see, where the thread or application were blocked (was not executed by CPU, but waiting for I/O, pagefault handler, or descheduled due short of CPU resources): This time reveals which code-paths are blocked and waiting while off-CPU, and for how long exactly. This differs from traditional profiling which often samples the activity of threads at a given interval, and (usually) only examine threads if they are executing work on-CPU. He also

perf stat for part of program

↘锁芯ラ 提交于 2019-11-29 04:32:02
Is it possible with perf to collect hardware counter statistics for only part of a program's execution? If so, how? likwid offers the feature of being able to define named regions, but it would be great if this was possible on systems with just perf installed. Some previous questions have returned relevant answers, but there are still some shortcomings: Using probe I get the same error and I'm using a slightly newer kernel (3.13). Are these fixes available in a newer version? Using perf_event_open I would like to maintain the flexibility to define events on the command line. I also took a peek

PERF STAT does not count memory-loads but counts memory-stores

时间秒杀一切 提交于 2019-11-29 02:32:44
Linux Kernel : 4.10.0-20-generic (also tried this on 4.11.3) Ubuntu : 17.04 I have been trying to collect stats of memory-accesses using perf stat . I am able to collect stats for memory-stores but the count for memory-loads return me a 0 value . The below is the details for memory-stores :- perf stat -e cpu/mem-stores/u ./libquantum_base.arnab 100 N = 100, 37 qubits required Random seed: 33 Measured 3277 (0.200012), fractional approximation is 1/5. Odd denominator, trying to expand by 2. Possible period is 10. 100 = 4 * 25 Performance counter stats for './libquantum_base.arnab 100': 158,115

perf_event_open - how to monitoring multiple events

隐身守侯 提交于 2019-11-29 01:59:16
问题 does anyone know how to set perf_event_attr struct that can trigger PMU to monitoring multiple (type) event via perf_event_open() ? Like perf record -e cycles,faults ls , it has two different event type (PERF_TYPE_HARDWARE and PERF_TYPE_SOFTWARE), but in example on perf_event_open's manpage, perf_event_attr.type can only assigned single value. Any suggestion will be appreciate, thanks! 20170208 Update Thanks for @gudok pointing me a direction, but the result seems some abnormal. Demo program

How to use linux `perf` tool to generate “Off-CPU” profile

社会主义新天地 提交于 2019-11-28 21:32:55
问题 Brendan D. Gregg (author of DTrace book) has interesting variant of profiling: the "Off-CPU" profiling (and Off-CPU Flame Graph; slides 2013, p112-137) to see, where the thread or application were blocked (was not executed by CPU, but waiting for I/O, pagefault handler, or descheduled due short of CPU resources): This time reveals which code-paths are blocked and waiting while off-CPU, and for how long exactly. This differs from traditional profiling which often samples the activity of

Linux下使用perf进行性能分析,并导出火焰图

a 夏天 提交于 2019-11-28 19:53:25
perf的安装参考这篇教程: debian/ubuntu 安装和使用perf 记录: perf record -F 99 -a -g -- sleep 60 #perf record -F 99 -p PID -g -- sleep 60 #使用-p指定pid perf script > out.perf 生成火焰图: # 下载火焰图生成工程 git clone -- depth 1 https : / / github . com / brendangregg / FlameGraph . git # 折叠调用栈 FlameGraph / stackcollapse - perf . pl out . perf > out . folded # 生成火焰图 FlameGraph / flamegraph . pl out . folded > out . svg 来源: https://blog.csdn.net/zhangpeterx/article/details/100121853

What are perf cache events meaning?

China☆狼群 提交于 2019-11-28 17:31:48
I am trying to figure out why a modified C program is running faster than its non modified counter part (I am adding very few lines of code to perform some additional work). In this context, I suspect " cache effects " to be the main explanation (instruction cache). Thus I reach the perf (https://perf.wiki.kernel.org/index.php/Main_Page) profiling tool but unfortunately I am not able to understand the meaning of its outputs regarding cache misses. Several events about cache are provided: cache-references [Hardware event] cache-misses [Hardware event] L1-dcache-loads [Hardware cache event] L1

Why doesn't perf report cache misses?

做~自己de王妃 提交于 2019-11-28 16:13:17
According to perf tutorials , perf stat is supposed to report cache misses using hardware counters. However, on my system (up-to-date Arch Linux), it doesn't: [joel@panda goog]$ perf stat ./hash Performance counter stats for './hash': 869.447863 task-clock # 0.997 CPUs utilized 92 context-switches # 0.106 K/sec 4 cpu-migrations # 0.005 K/sec 1,041 page-faults # 0.001 M/sec 2,628,646,296 cycles # 3.023 GHz 819,269,992 stalled-cycles-frontend # 31.17% frontend cycles idle 132,355,435 stalled-cycles-backend # 5.04% backend cycles idle 4,515,152,198 instructions # 1.72 insns per cycle # 0.18

某个应用的CPU使用率居然达到100%,我该怎么做?(三)

笑着哭i 提交于 2019-11-28 15:21:26
某个应用的CPU使用率居然达到100%,我该怎么做?(三) 1. 引 你们好,可爱的小伙伴们^_^! 咱们最常用什么指标来描述系统的CPU性能呢?我想你的答案,可能不是平均负载,也不是CPU上下文切换,而是另一个更直观的指标----> CPU使用率。 CPU使用率是单位时间内CPU使用情况的统计,以百分比的方式展示。那么,作为最常用也是最熟悉的CPU指标,你能说出CPU使用率到底是怎么算出来的吗?再有,诸如top,ps之类的性能工具展示的%user,%nice,%system,%iowait,%steal等等,你又能弄清楚他们之间的不同吗? 2. 什么是CPU使用率? Linux作为一个多任务操作系统,将每个CPU的时间划分为很短的时间片,再通过调度器轮流分配给各个任务使用,因此造成多任务同时运行的错觉。 为了维护CPU时间,Linux通过事先定义的节拍率(内核中表示为HZ),触发时间中断,并使用全局变量Jiffies记录了开机以来的节拍数。每发生一次时间中断,Jiffies的值就加1。 节拍率HZ是内核的可配选项,可以设置为100,250,1000等。不同的系统可能设置不同数值,你可以通过查询/boot/config内核选项来查看它的配置值。比如在我的系统中,节拍率设置成了1000,也就是每秒钟触发1000次时间中断。 [root@localhost ~]# cat /etc

Good resources on how to program PEBS (Precise event based sampling) counters?

你。 提交于 2019-11-28 12:53:34
I have been trying to log all memory accesses of a program, which as I read seems to be impossible. I have been trying to see to what extent can I go to log atleast a major portion of the memory accesses, if not all. So I was looking to program the PEBS counters in such a way that I could see changes in the number of memory access samples collected. I wanted to know if I can do this by modifying the counter-reset value of PEBS counters. (Usually this goes to zero, but I want to set it to a higher value) So I was looking to program these pebs counters on my own. Has anybody had experience