perf

Getting user-space stack information from perf

ぃ、小莉子 提交于 2019-12-03 05:21:47
问题 I'm currently trying to track down some phantom I/O in a PostgreSQL build I'm testing. It's a multi-process server and it isn't simple to associate disk I/O back to a particular back-end and query. I thought Linux's perf tool would be ideal for this, but I'm struggling to capture block I/O performance counter metrics and associate them with user-space activity. It's easy to record block I/O requests and completions with, eg: sudo perf record -g -T -u postgres -e 'block:block_rq_*' and the

Profiling sleep times with perf

混江龙づ霸主 提交于 2019-12-03 05:00:35
问题 I was looking for a way to find out where my program spends time. I read the perf tutorial and tried to profile sleep times as it is described there. I wrote the simplest possible program to profile: #include <unistd.h> int main() { sleep(10); return 0; } then I executed it with perf: $ sudo perf record -e sched:sched_stat_sleep -e sched:sched_switch -e sched:sched_process_exit -g -o ~/perf.data.raw ./a.out [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0

ubuntu 12.10 perf stat <not supported> cycles

前提是你 提交于 2019-12-03 03:29:44
The system I use is ubuntu-12.10-desktop-amd64 I install perf through apt-get install linux-tools linux-tools-common linux-tools-3.5.0-40 when I use perf list , it list all the events as expected. But when I use perf stat , the result seems abnormal perf stat ls the result is: Performance counter stats for 'ls': 3.988508 task-clock # 0.678 CPUs utilized 172 context-switches # 0.043 M/sec 0 CPU-migrations # 0.000 K/sec 276 page-faults # 0.069 M/sec <not supported> cycles <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend <not supported> instructions <not supported>

why does perf stat show “stalled-cycles-backend” as <not supported>?

◇◆丶佛笑我妖孽 提交于 2019-12-03 03:10:14
问题 Running perf stat ls shows this: Performance counter stats for 'ls': 1.388670 task-clock # 0.067 CPUs utilized 2 context-switches # 0.001 M/sec 0 cpu-migrations # 0.000 K/sec 266 page-faults # 0.192 M/sec 3515391 cycles # 2.531 GHz 2096636 stalled-cycles-frontend # 59.64% frontend cycles idle <not supported> stalled-cycles-backend 2927468 instructions # 0.83 insns per cycle # 0.72 stalled cycles per insn 615636 branches # 443.328 M/sec 22172 branch-misses # 3.60% of all branches 0.020657192

utf8 &lt;-&gt; utf16: codecvt poor performance

匿名 (未验证) 提交于 2019-12-03 03:05:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I'm looking onto some of my old (and exclusively win32 oriented) stuff and thinking about making it more modern/portable - i.e. reimplementing some widely reusable parts in C++11. One of these parts is convertin between utf8 and utf16. In Win32 API I'm using MultiByteToWideChar / WideCharToMultiByte , trying to port that stuff to C++11 using sample code from here: https://stackoverflow.com/a/14809553 . The result is Release build (compiled by MSVS 2013, run on Core i7 3610QM) stdlib = 1587.2 ms Win32 = 127.2 ms Debug build stdlib = 5733.8 ms

Linux perf events: cpu-clock and task-clock - what is the difference

可紊 提交于 2019-12-03 03:03:16
问题 Linux perf tools (some time ago named perf_events ) has several builtin universal software events. Two most basic of them are: task-clock and cpu_clock (internally called PERF_COUNT_SW_CPU_CLOCK and PERF_COUNT_SW_TASK_CLOCK ). But what is wrong with them is lack of description. ysdx user reports that man perf_event_open has short description: PERF_COUNT_SW_CPU_CLOCK This reports the CPU clock, a high-resolution per- CPU timer. PERF_COUNT_SW_TASK_CLOCK This reports a clock count specific to

ubuntu 12.10 perf stat &lt;not supported&gt; cycles

匿名 (未验证) 提交于 2019-12-03 03:03:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: The system I use is ubuntu-12.10-desktop-amd64 I install perf through apt-get install linux-tools linux-tools-common linux-tools-3.5.0-40 when I use perf list , it list all the events as expected. But when I use perf stat , the result seems abnormal perf stat ls the result is: Performance counter stats for 'ls': 3.988508 task-clock # 0.678 CPUs utilized 172 context-switches # 0.043 M/sec 0 CPU-migrations # 0.000 K/sec 276 page-faults # 0.069 M/sec <not supported> cycles <not supported> stalled-cycles-frontend <not supported> stalled-cycles

How does linux's perf utility understand stack traces?

匿名 (未验证) 提交于 2019-12-03 02:49:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 由 翻译 强力驱动 问题: Linux's perf utility is famously used by Brendan Gregg to generate flamegraphs for c/c++, jvm code, nodejs code, etc. Does the Linux kernel natively understand stack traces? Where can I read more about how a tool is able to introspect into stack traces of processes, even if processes are written in completely different languages? 回答1: There is short introduction about stack traces in perf by Gregg: http://www.brendangregg.com/perf.html 4.4 Stack Traces Always compile with frame pointers. Omitting frame pointers is an evil compiler

Getting user-space stack information from perf

匿名 (未验证) 提交于 2019-12-03 02:45:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I'm currently trying to track down some phantom I/O in a PostgreSQL build I'm testing. It's a multi-process server and it isn't simple to associate disk I/O back to a particular back-end and query. I thought Linux's perf tool would be ideal for this, but I'm struggling to capture block I/O performance counter metrics and associate them with user-space activity. It's easy to record block I/O requests and completions with, eg: sudo perf record -g -T -u postgres -e 'block:block_rq_*' and the user-space pid is recorded, but there's no kernel or

Thread Utilization profiling on linux

微笑、不失礼 提交于 2019-12-03 02:33:19
Linux perf-tools are great for finding hotspots in CPU cycles and optimizing those hotspots. But once some parts are parallelized it becomes difficult to spot the sequential parts since they take up significant wall time but not necessarily many CPU cycles (the parallel parts are already burning those). To avoid the XY-problem: My underlying motivation is to find sequential bottlenecks in multi-threaded code. The parallel phases can easily dominate the aggregate CPU-cycle statistics even though the sequential phases dominate wall time due to amdahl's law . For java applications this is fairly