perf | 易学教程

How does Linux perf calculate the cache-references and cache-misses events

阅读更多关于 How does Linux perf calculate the cache-references and cache-misses events

问题 I am confused by the perf events cache-misses and L1-icache-load-misses,L1-dcache-load-misses,LLC-load-misses . As when I tried to perf stat all of them, the answer doesn't seem consistent: %$: sudo perf stat -B -e cache-references,cache-misses,cycles,instructions,branches,faults,migrations,L1-dcache-load-misses,L1-dcache-loads,L1-dcache-stores,L1-icache-load-misses,LLC-loads,LLC-load-misses,LLC-stores,LLC-store-misses,LLC-prefetches ./my_app 523,288,816 cache-references (22.89%) 205,331,370

Why doesn't perf report cache misses?

阅读更多关于 Why doesn't perf report cache misses?

问题 According to perf tutorials, perf stat is supposed to report cache misses using hardware counters. However, on my system (up-to-date Arch Linux), it doesn't: [joel@panda goog]$ perf stat ./hash Performance counter stats for './hash': 869.447863 task-clock # 0.997 CPUs utilized 92 context-switches # 0.106 K/sec 4 cpu-migrations # 0.005 K/sec 1,041 page-faults # 0.001 M/sec 2,628,646,296 cycles # 3.023 GHz 819,269,992 stalled-cycles-frontend # 31.17% frontend cycles idle 132,355,435 stalled

linux perf: how to interpret and find hotspots

阅读更多关于 linux perf: how to interpret and find hotspots

问题 I tried out linux' perf utility today and am having trouble in interpreting its results. I'm used to valgrind's callgrind which is of course a totally different approach to the sampling based method of perf. What I did: perf record -g -p $(pidof someapp) perf report -g -n Now I see something like this: + 16.92% kdevelop libsqlite3.so.0.8.6 [.] 0x3fe57 ↑ + 10.61% kdevelop libQtGui.so.4.7.3 [.] 0x81e344 ▮ + 7.09% kdevelop libc-2.14.so [.] 0x85804 ▒ + 4.96% kdevelop libQtGui.so.4.7.3 [.]

Unknown events in nodejs/v8 flamegraph using perf_events

阅读更多关于 Unknown events in nodejs/v8 flamegraph using perf_events

问题 I try to do some nodejs profiling using Linux perf_events as described by Brendan Gregg here. Workflow is following: run node >0.11.13 with --perf-basic-prof , which creates /tmp/perf-(PID).map file where JavaScript symbol mapping are written. Capture stacks using perf record -F 99 -p `pgrep -n node` -g -- sleep 30 Fold stacks using stackcollapse-perf.pl script from this repository Generate svg flame graph using flamegraph.pl script I get following result (which look really nice at the

Conditional compilation based on functionality in Linux kernel headers

阅读更多关于 Conditional compilation based on functionality in Linux kernel headers

问题 Consider the case where I'm using some functionality from the Linux headers exported to user space, such as perf_event_open from <linux/perf_event.h> . The functionality offered by this API has changed over time, as members have been added to the perf_event_attr , such as perf_event_attr.cap_user_time. How can I write source that compiles and uses these new functionalities if they are available locally, but falls back gracefully if they aren't and doesn't use them? In particular, how can I

Monitoring Linux performance events for cgroups

阅读更多关于 Monitoring Linux performance events for cgroups

问题 I am currently trying to monitor some hardware events of my system (6 hardware counters and 24 CPUs) as well as its cgroups. I take here the example of the LLC loads and cpu-cycles events. To this end I use the perf command. However, when considering an idle cgroup (actually corresponding to a docker container only running bash) and running perf for either cgroups or system wide monitoring, it seems that I am getting approximatly the same number of cpu-cycles in both cases: $ sudo perf stat

Is it possible to extract instruction specific energy consumption in a program? [closed]

阅读更多关于 Is it possible to extract instruction specific energy consumption in a program? [closed]

问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed last year . What i mean is that given a source code file is it possible to extract energy consumption levels for a particular code block or 1 single instruction, using a tool like perf? 回答1: There are tools for measuring power consumption (see @jww's comment for links), but they don't even try

perf report showing “__libm_pow_l9”

阅读更多关于 perf report showing “__libm_pow_l9”

问题 I am using perf to profile my program, which involves loads of use of exp() and pow(). The code was compiled use icc -g -fno-omit-frame-pointer test.c and profiled with: perf record -g ./a.out which is followed by: perf report -g 'graph,0.5,caller' and perf gave: the two functions __libm_exp_l9() and __libm_pow_l9() are consuming considerable amount of computational power. So I am wondering if they are just alias to exp() and pow(), respectively? Or any suggestions to read in the report here?

Perf Monitoring for rdtsc dynamically

阅读更多关于 Perf Monitoring for rdtsc dynamically

问题 Is there a way to monitor for assembly instructions in "real-time" dynamically using perf? I have seen that if I use perf record /perf top and then click on the recorded functions, I see the assembly instructions, but can I directly monitor specific assembly instructions e.g., rdtsc or clflush e.g., how often they are called by a process within certain period using perf? I am using Debian 9 on Skylake and also on Haswell. sudo uname -a Linux bla 4.9.0-amd64 #1 SMP Debian 4.9.130-2 (2018-10-27

How can a specific application be monitored by perf inside the kvm?

阅读更多关于 How can a specific application be monitored by perf inside the kvm?

问题 I have an application which I want to monitor it via perf stat when running inside a kvm VM. After Googling I have found that perf kvm stat can do this. However there is an error by running the command: sudo perf kvm stat record -p appPID which results in help representation ... usage: perf kvm stat record [<options>] -p, --pid <pid> record events on existing process id -t, --tid <tid> record events on existing thread id -r, --realtime <n> collect data with this RT SCHED_FIFO priority --no