perf

How can I get perf to find symbols in my program

天涯浪子 提交于 2019-12-02 17:10:01
When using perf report , I don't see any symbols for my program, instead I get output like this: $ perf record /path/to/racket ints.rkt 10000 $ perf report --stdio # Overhead Command Shared Object Symbol # ........ ........ ................. ...... # 70.06% ints.rkt [unknown] [.] 0x5f99b8 26.28% ints.rkt [kernel.kallsyms] [k] 0xffffffff8103d0ca 3.66% ints.rkt perf-32046.map [.] 0x7f1d9be46650 Which is fairly uninformative. The relevant program is built with debugging symbols, and the sysprof tool shows the appropriate symbols, as does Zoom, which I think is using perf under the hood. Note that

why does perf stat show “stalled-cycles-backend” as <not supported>?

ε祈祈猫儿з 提交于 2019-12-02 17:09:17
Running perf stat ls shows this: Performance counter stats for 'ls': 1.388670 task-clock # 0.067 CPUs utilized 2 context-switches # 0.001 M/sec 0 cpu-migrations # 0.000 K/sec 266 page-faults # 0.192 M/sec 3515391 cycles # 2.531 GHz 2096636 stalled-cycles-frontend # 59.64% frontend cycles idle <not supported> stalled-cycles-backend 2927468 instructions # 0.83 insns per cycle # 0.72 stalled cycles per insn 615636 branches # 443.328 M/sec 22172 branch-misses # 3.60% of all branches 0.020657192 seconds time elapsed Why is stalled-cycles-backend shown as "not supported"? What kind of CPU, hardware,

Linux perf events: cpu-clock and task-clock - what is the difference

不打扰是莪最后的温柔 提交于 2019-12-02 17:04:55
Linux perf tools (some time ago named perf_events ) has several builtin universal software events. Two most basic of them are: task-clock and cpu_clock (internally called PERF_COUNT_SW_CPU_CLOCK and PERF_COUNT_SW_TASK_CLOCK ). But what is wrong with them is lack of description. ysdx user reports that man perf_event_open has short description: PERF_COUNT_SW_CPU_CLOCK This reports the CPU clock, a high-resolution per- CPU timer. PERF_COUNT_SW_TASK_CLOCK This reports a clock count specific to the task that is running. But the description is hard to understand. Can somebody give authoritative

Understanding perf detail when comparing two different implementations of a BFS algorithm

无人久伴 提交于 2019-12-02 06:57:40
The results below are measured using perf on a compute server with 32 cores. I know my implementation is unoptimized but purposely as I want to make comparisons. I understand that graph algorithms tend to have low locality which researchers try to address. I'm unclear of the results, though. The time elapsed is misleading. My implementation runs through a graph with about 4mm nodes in about 10 seconds and the rest of the time pre processing. The optimized version uses the same input and traverses about 10 times with each less than a second each so it's really just pre-processing time. I'm not

perf report shows this function “__memset_avx2_unaligned_erms” has overhead. does this mean memory is unaligned?

丶灬走出姿态 提交于 2019-12-02 06:05:49
I am trying to profile my C++ code using perf tool. Implementation contains code with SSE/AVX/AVX2 instructions. In addition to that code is compiled with -O3 -mavx2 -march=native flags. I believe __memset_avx2_unaligned_erms function is a libc implementation of memset . perf shows that this function has considerable overhead. Function name indicates that memory is unaligned, however in the code I am explicitly aligning the memory using GCC built-in macro __attribute__((aligned (x))) What might be the reason for this function to have significant overhead and also why unaligned version is

性能分析 函数粒度

社会主义新天地 提交于 2019-12-01 13:53:54
在Linux下做性能分析3:perf - 知乎 https://zhuanlan.zhihu.com/p/22194920 Linux Perf 性能分析工具及火焰图浅析 - 知乎 https://zhuanlan.zhihu.com/p/54276509 来源: https://www.cnblogs.com/yuanjiangw/p/11689445.html

Logging all memory accesses of any executable/process in Linux

佐手、 提交于 2019-12-01 11:19:27
I have been looking for a way to log all memory accesses of a process/execution in Linux. I know there have been questions asked on this topic previously here like this Logging memory access footprint of whole system in Linux But I wanted to know if there is any non-instrumentation tool that performs this activity. I am not looking for QEMU/ VALGRIND for this purpose since it would be a bit slow and I want as little overhead as possible. I looked at perf mem and PEBS events like cpu/mem-loads/pp for this purpose but I see that they will collect only sampled data and I actually wanted the trace

Python subprocess running in background before returning output

你离开我真会死。 提交于 2019-12-01 07:12:58
I have some Python code that I want to debug with perf. For that purpose I want to use subprocess. The following command returns instruction-related information of a process until the command is exited via Ctrl^C. perf stat -p <my_pid> Now, I want to run this inside a Python code in background, until some point where I want to be able to terminate its operation and print the commands output. To show what I mean: x = subprocess.call(["perf","stat","-p",str(GetMyProcessID())]) .. CODE TO DEBUG .. print x # I want to terminate subprocess here and output 'x' Now, I want to determine what to do at

How to narrow down perf.data to a time sub interval

痞子三分冷 提交于 2019-12-01 05:19:16
I use linux perf (perf_events) to produce a perf.data file with timestamps. How can I generate a report of all the events in a sub interval of time [i-start, i-end]? Can I maybe narrow down perf.data to a perf_subinterv.data file with only events in [i-start, i-end]? I need to do this to analyze short intervals (2s - 6s) of poor performance every 5mins or so. Zulan Most perf tools, including perf report , support filtering by time: --time:: Only analyze samples within given time window: <start>,<stop>. Times have the format seconds.microseconds. If start is not given (i.e., time string is ',x