perf

Perf Stat vs Perf Record

别等时光非礼了梦想. 提交于 2019-12-12 18:42:57
问题 I am confused about the difference between perf record and perf stat when it comes to counting events like page-faults, cache-misses and anything else from perf list . I have 2 questions below the answer to "Question 1" might also help answer "Question 2" but I wrote them out explicitly in the case that it doesn't. Question 1: It is my understanding that perf stat gets a "summary" of counts but when used with the -I option gets the counts at the specified millisecond interval. With this

Why does perf show that sleep takes all cores?

吃可爱长大的小学妹 提交于 2019-12-12 04:08:21
问题 I am trying to familiarize myself with perf and run it against various programs I wrote. When I launch it against program that is 100% single threaded, perf shows that it takes two cores on machine (task-clock event). Here's the example output: perf stat -a --per-core python3 test.py Performance counter stats for 'system wide': S0-C0 1 19004.951263 task-clock (msec) # 1.000 CPUs utilized (100.00%) S0-C0 1 5,582 context-switches (100.00%) S0-C0 1 19 cpu-migrations (100.00%) S0-C0 1 3,746 page

ARM Headers to Get Proper Call Stacks

∥☆過路亽.° 提交于 2019-12-11 11:44:16
问题 I am currently carrying out optimizations on a linux-based software itself on an ARM processor. Those optimizations are mostly in the form of ARM and ARM NEON functions. In order to profile the software I use perf record and flame-graphs, however, once I introduce the assembler functions, they do not stack on top of the functions that call them but rather seemingly random places. My question therefore was, what should I include in my functions for them to appear properly in the call stacks.

Using PEBS and Linux Perf to Count the number of CPU cycles passed to execute X number of instructions

送分小仙女□ 提交于 2019-12-11 04:27:00
问题 I want to do something like this: After 100 million instructions have passed, query the Linux perf HW CPU cycles and record it in a file. I want to use this code to characterize the performance of applications/benchmark programs during different phases of program execution. I have a hint that I need to setup Intel PEBS which overflows after 100 million instructions have passed and query the linux perf counters HW cpu cycles counter. Any pointer on where to start and if someone has already

How does perf use the offcore events?

空扰寡人 提交于 2019-12-11 04:11:30
问题 Some built-in perf events are mapped to offcore events. For example, LLC-loads and LLC-load-misses are mapped to OFFCORE_RESPONSE. events. This can be easily determined as discussed in here. However, these offcore events require writing certain values to certain MSR registers to actually specify a particular event. perf seems to be using an array called something like snb_hw_cache_extra_regs to specify what values to write to which MSR registers. I would like to know how this array is used.

How to calculate MIPS using perf stat

风格不统一 提交于 2019-12-11 04:10:02
问题 Following answer about Benchmarking - How to count number of instructions sent to CPU to find consumed MIPS suggest that: perf stat ./my_program on Linux will use CPU performance counters to record how many instructions it ran, and how many core clock cycles it took. (And how much CPU time it used, and will calculate MIPS for you). An example generates following output which does not contain calculated MIPS information. Performance counter stats for './hello.py': 1452.607792 task-clock (msec)

How can I read performance counters from the kernel?

♀尐吖头ヾ 提交于 2019-12-11 02:39:32
问题 I have been using the Linux perf tool in the user space. I want to write code that reads performance counters for a thread every time it does a context switch. The steps required are: 1) Get a mechanism to read the performance counter registers. 2) Call step(1) from the scheduler after every context switch. I am stuck at step(1) as I could not figure out which functions to call for reading the performance registers and how to describe an event while doing it. I tried going through the docs

How to Configure and Sample Intel Performance Counters In-Process

蓝咒 提交于 2019-12-11 00:53:16
问题 In a nutshell, I'm trying to achieve the following inside a userland benchmark process (pseudo-code, assuming x86_64 and a UNIX system): results[] = ... for (iteration = 0; iteration < num_iterations; iteration++) { pctr_start = sample_pctr(); the_benchmark(); pctr_stop = sample_pctr(); results[iteration] = pctr_stop - pctr_start; } FWIW, the performance counter I am thinking of using is CPU_CLK_UNHALTED.THREAD_ALL , to read the number of core cycles independent of clock frequency changes (In

技术分享 | 使用 Perf 和火焰图分析软件

ぐ巨炮叔叔 提交于 2019-12-10 17:28:48
作者:Agustín 翻译:孟维克 原文: https://www.percona.com/blog/2019/11/20/profiling-software-using-perf-and-flame-graphs/ 在这篇博文中,我们将探讨如何一起使用perf和火焰图。它们用于生成我们选择的软件正在调用的函数的图形。在此我们使用Percona分支版本,但是它可以扩展到任何可以进行解析堆栈跟踪的软件。 在继续之前,请注意,与任何分析工具一样,除非您知道自己在做什么,否则不要在生产环境运行。 安装需要的软件包 为了简单,为使用CentOS7版本,但是对于基于Debian的发行版来说,它们应该是相同的(步骤中的唯一区别是用 apt-get install linux-tools-$(uname -r) 代替yum命令)。 安装perf SHELL> sudo yum install -y perf 获得火焰图软件包 SHELL> mkdir -p ~/src SHELL> cd ~/src SHELL> git clone https://github.com/brendangregg/FlameGraph 全部安装完毕!让我们继续 抓取采集样本 火焰图是一种可视化数据的方式,所以我们需要一些可以作为基础的样本。可以用三种方式做到这一点(请注意,这里我们使用 -p

Ubuntu 16.04 LTS - How to enable symbols for the perf tool

心不动则不痛 提交于 2019-12-10 10:37:35
问题 I'm trying to gather some profiling data for my app and I run the perf tool and Flame Graphs for that. I'm referring the instructions provided in this slideshare: https://www.slideshare.net/brendangregg/java-performance-analysis-on-linux-with-flame-graphs Below are the commands that I'm running: 1. sudo perf record -F 997 -a -g 2. sudo perf script > out.stacks01 When I run the second command, it displays below messages: Failed to open /tmp/perf-9931.map, continuing without symbols. no symbols