perf

AMD perf events

时光总嘲笑我的痴心妄想 提交于 2019-12-10 10:05:49
问题 I am trying to use perf on my device with an AMD cpu, but I can't really find any information about how to get, let's say, cache-misses from AMD. I read that you need to write -e rNNN , where NNN is a hex-code of event, but I didn't manage to find any table or something to look at those codes. Could you help me with this, because it seems that there is no information in the internet at all! Actually, in the manual for perf there are some links, but they are not valid :( 回答1: Check perf list

perf stat gives different number of instruction for every run

本秂侑毒 提交于 2019-12-10 04:33:50
问题 I ran perf analysis on the following empty program, #include <stdio.h> int main() { } After compiling and running perf stat ./a.out I got the following output saying (along with other data like number of cycles, task-clock etc): 418,869 instructions # 0.87 insns per cycle The number of instructions changes during every 'perf' analysis on the same elf. My actual need is to find the number of instructions in a particular function I wrote. So I will be subtracting the above number from the

Can't sample hardware cache events with linux perf

早过忘川 提交于 2019-12-09 19:25:17
问题 For some reason, I can't sample ( perf record ) hardware cache events: # perf record -e L1-dcache-stores -a -c 100 -- sleep 5 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.607 MB perf.data (~26517 samples) ] # perf script but I can count them ( perf stat ): # perf stat -e L1-dcache-stores -a -- sleep 5 Performance counter stats for 'sleep 5': 711,781 L1-dcache-stores 5.000842990 seconds time elapsed I tried on different CPUs, OS versions (and kernel

Haswell microarchitecture don't have Stalled-cycles-backend in perf

泄露秘密 提交于 2019-12-08 19:44:21
问题 I installed perf on Haswell CPU( Intel Core i7-4790 ). But the "perf list" does not include "stalled-cycles-frontend" nor "stalled-cycles-backend". I checked the http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html and not found the performance events relevant to stalled-cycles-backend from the Table 19-7( Non-Architectural Performance Events In the Processor Core of 4th Generation Intel Core Processors). So my question is: how can I measure stalled

Get PEBS data linear address from perf

心不动则不痛 提交于 2019-12-08 11:14:42
问题 I'm now trying to using perf to measure L3_Miss(LLC Miss) with PEBS. Here is the command: perf record -d -e cpu/event=0xd1,umask=0x20/ppu -c 1 test , and when the perf finished, I using perf script -F ip,sym,addr to check the result. According to the SDM from intel, Vol. 3B Table 18-55. PEBS record contains a field named Data Linear Address, stands for address of load or destination of store, is what I need. My question is, the field addr I specified in perf-script is same as Data Linear

PEBS records much less memory-access samples than actually present

不问归期 提交于 2019-12-08 08:08:21
问题 I have been trying to log memory accesses that are made by a program using Perf and PEBS counters. My intention was to log all of the memory accesses made by a program (I chose programs from SpecCPU2006 ). By tweaking certain parameters, I seem to record much more samples than there actually is for the program. I know, as has been said previously, that it is tough to record all of the memory access samples but leaving that aside, I want to know how can PEBS record more samples than there

What does +-# after percent of cache misses mean in perf stat?

大城市里の小女人 提交于 2019-12-08 07:50:38
问题 I used perf stat --repeat 100 -e cache-references,cache- misses,cycles,instructions,branches,faults,migrations ./avx2ADD command and the output is followed. What does +- 8.93% for cache-misses mean when percent of cache misses is equal to 4.010 % ? 32,425 cache-references ( +- 0.54% ) 1,300 cache-misses # 4.010 % of all cache refs ( +- 8.93% ) 538,839 cycles ( +- 0.28% ) 520,056 instructions # 0.97 insns per cycle ( +- 0.22% ) 98,720 branches ( +- 0.20% ) 95 faults ( +- 0.12% ) 0 migrations (

Monitoring of the CLFLUSH assembly instruction

谁都会走 提交于 2019-12-08 04:16:33
问题 I am interested in monitoring the CLFLUSH instruction in real-time either system-wide or for a specific process. The platform I am using is on 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26) x86_64 GNU/Linux, Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz Currently, I am trying to do that using perf top / perf stat but i am not able to filter on this specific instruction. Any idea would be greatly appreciated. 来源: https://stackoverflow.com/questions/53359764/monitoring-of-the-clflush

performance monitoring for subset of process execution

半城伤御伤魂 提交于 2019-12-07 22:56:59
问题 I intend to collect the statistics of a linux application for a small subset of its program execution. This subset can be defined as first n instructions, or first n cycles. For the defined subset, we are interested in statistics like branch prediction accuracy, cache hit-rates, and the IPC of the core. perf tool looks like the best bet for such monitoring. However, the way to specify a subset in perf is by running a command which gives the subset information. Example : If I want to collect

Why does it take so many instructions to run an empty program?

こ雲淡風輕ζ 提交于 2019-12-07 09:14:43
问题 So recently I learned about the perf command in linux. I decided to run some experiments, so I created an empty c program and measured how many instructions it took to run: echo 'int main(){}'>emptyprogram.c && gcc -O3 emptyprogram.c -o empty perf stat ./empty This was the output: Performance counter stats for './empty': 0.341833 task-clock (msec) # 0.678 CPUs utilized 0 context-switches # 0.000 K/sec 0 cpu-migrations # 0.000 K/sec 112 page-faults # 0.328 M/sec 1,187,561 cycles # 3.474 GHz 1