perf | 易学教程

Perf Stat vs Perf Record

阅读更多关于 Perf Stat vs Perf Record

问题 I am confused about the difference between perf record and perf stat when it comes to counting events like page-faults, cache-misses and anything else from perf list . I have 2 questions below the answer to "Question 1" might also help answer "Question 2" but I wrote them out explicitly in the case that it doesn't. Question 1: It is my understanding that perf stat gets a "summary" of counts but when used with the -I option gets the counts at the specified millisecond interval. With this

Why does perf show that sleep takes all cores?

阅读更多关于 Why does perf show that sleep takes all cores?

问题 I am trying to familiarize myself with perf and run it against various programs I wrote. When I launch it against program that is 100% single threaded, perf shows that it takes two cores on machine (task-clock event). Here's the example output: perf stat -a --per-core python3 test.py Performance counter stats for 'system wide': S0-C0 1 19004.951263 task-clock (msec) # 1.000 CPUs utilized (100.00%) S0-C0 1 5,582 context-switches (100.00%) S0-C0 1 19 cpu-migrations (100.00%) S0-C0 1 3,746 page

ARM Headers to Get Proper Call Stacks

阅读更多关于 ARM Headers to Get Proper Call Stacks

问题 I am currently carrying out optimizations on a linux-based software itself on an ARM processor. Those optimizations are mostly in the form of ARM and ARM NEON functions. In order to profile the software I use perf record and flame-graphs, however, once I introduce the assembler functions, they do not stack on top of the functions that call them but rather seemingly random places. My question therefore was, what should I include in my functions for them to appear properly in the call stacks.

Using PEBS and Linux Perf to Count the number of CPU cycles passed to execute X number of instructions

阅读更多关于 Using PEBS and Linux Perf to Count the number of CPU cycles passed to execute X number of instructions

问题 I want to do something like this: After 100 million instructions have passed, query the Linux perf HW CPU cycles and record it in a file. I want to use this code to characterize the performance of applications/benchmark programs during different phases of program execution. I have a hint that I need to setup Intel PEBS which overflows after 100 million instructions have passed and query the linux perf counters HW cpu cycles counter. Any pointer on where to start and if someone has already

How does perf use the offcore events?

阅读更多关于 How does perf use the offcore events?

问题 Some built-in perf events are mapped to offcore events. For example, LLC-loads and LLC-load-misses are mapped to OFFCORE_RESPONSE. events. This can be easily determined as discussed in here. However, these offcore events require writing certain values to certain MSR registers to actually specify a particular event. perf seems to be using an array called something like snb_hw_cache_extra_regs to specify what values to write to which MSR registers. I would like to know how this array is used.

How to calculate MIPS using perf stat

阅读更多关于 How to calculate MIPS using perf stat

问题 Following answer about Benchmarking - How to count number of instructions sent to CPU to find consumed MIPS suggest that: perf stat ./my_program on Linux will use CPU performance counters to record how many instructions it ran, and how many core clock cycles it took. (And how much CPU time it used, and will calculate MIPS for you). An example generates following output which does not contain calculated MIPS information. Performance counter stats for './hello.py': 1452.607792 task-clock (msec)

How can I read performance counters from the kernel?

阅读更多关于 How can I read performance counters from the kernel?

问题 I have been using the Linux perf tool in the user space. I want to write code that reads performance counters for a thread every time it does a context switch. The steps required are: 1) Get a mechanism to read the performance counter registers. 2) Call step(1) from the scheduler after every context switch. I am stuck at step(1) as I could not figure out which functions to call for reading the performance registers and how to describe an event while doing it. I tried going through the docs

How to Configure and Sample Intel Performance Counters In-Process

阅读更多关于 How to Configure and Sample Intel Performance Counters In-Process

问题 In a nutshell, I'm trying to achieve the following inside a userland benchmark process (pseudo-code, assuming x86_64 and a UNIX system): results[] = ... for (iteration = 0; iteration < num_iterations; iteration++) { pctr_start = sample_pctr(); the_benchmark(); pctr_stop = sample_pctr(); results[iteration] = pctr_stop - pctr_start; } FWIW, the performance counter I am thinking of using is CPU_CLK_UNHALTED.THREAD_ALL , to read the number of core cycles independent of clock frequency changes (In

技术分享 | 使用 Perf 和火焰图分析软件

阅读更多关于技术分享 | 使用 Perf 和火焰图分析软件

作者：Agustín 翻译：孟维克原文： https://www.percona.com/blog/2019/11/20/profiling-software-using-perf-and-flame-graphs/ 在这篇博文中，我们将探讨如何一起使用perf和火焰图。它们用于生成我们选择的软件正在调用的函数的图形。在此我们使用Percona分支版本，但是它可以扩展到任何可以进行解析堆栈跟踪的软件。在继续之前，请注意，与任何分析工具一样，除非您知道自己在做什么，否则不要在生产环境运行。安装需要的软件包为了简单，为使用CentOS7版本，但是对于基于Debian的发行版来说，它们应该是相同的（步骤中的唯一区别是用 apt-get install linux-tools-$(uname -r) 代替yum命令）。安装perf SHELL> sudo yum install -y perf 获得火焰图软件包 SHELL> mkdir -p ~/src SHELL> cd ~/src SHELL> git clone https://github.com/brendangregg/FlameGraph 全部安装完毕！让我们继续抓取采集样本火焰图是一种可视化数据的方式，所以我们需要一些可以作为基础的样本。可以用三种方式做到这一点（请注意，这里我们使用 -p

Ubuntu 16.04 LTS - How to enable symbols for the perf tool

阅读更多关于 Ubuntu 16.04 LTS - How to enable symbols for the perf tool

问题 I'm trying to gather some profiling data for my app and I run the perf tool and Flame Graphs for that. I'm referring the instructions provided in this slideshare: https://www.slideshare.net/brendangregg/java-performance-analysis-on-linux-with-flame-graphs Below are the commands that I'm running: 1. sudo perf record -F 997 -a -g 2. sudo perf script > out.stacks01 When I run the second command, it displays below messages: Failed to open /tmp/perf-9931.map, continuing without symbols. no symbols