perf | 易学教程

Measure page faults from a c program

阅读更多关于 Measure page faults from a c program

问题 I am comparing a few system calls where I read/write from/to memory. Is there any API defined to measure page faults (pages in/out) in C ? I found this library libperfstat.a but it is for AIX , I couldn't find anything for linux. Edit: I am aware of time & perf-stat commands in linux, just exploring if there is anything available for me to use inside the C program. 回答1: There is getrusage function (SVr4, 4.3BSD. POSIX.1-2001; but not all fields are defined in standard). In linux there are

linux perf: how to interpret and find hotspots

阅读更多关于 linux perf: how to interpret and find hotspots

I tried out linux' perf utility today and am having trouble in interpreting its results. I'm used to valgrind's callgrind which is of course a totally different approach to the sampling based method of perf. What I did: perf record -g -p $(pidof someapp) perf report -g -n Now I see something like this: + 16.92% kdevelop libsqlite3.so.0.8.6 [.] 0x3fe57 ↑ + 10.61% kdevelop libQtGui.so.4.7.3 [.] 0x81e344 ▮ + 7.09% kdevelop libc-2.14.so [.] 0x85804 ▒ + 4.96% kdevelop libQtGui.so.4.7.3 [.] 0x265b69 ▒ + 3.50% kdevelop libQtCore.so.4.7.3 [.] 0x18608d ▒ + 2.68% kdevelop libc-2.14.so [.] memcpy ▒ + 1

Hardware cache events and perf

阅读更多关于 Hardware cache events and perf

问题 When I run perf list I see a bunch of Hardware Cache Events , as follows: $ perf list | grep 'cache event' L1-dcache-load-misses [Hardware cache event] L1-dcache-loads [Hardware cache event] L1-dcache-stores [Hardware cache event] L1-icache-load-misses [Hardware cache event] LLC-load-misses [Hardware cache event] LLC-loads [Hardware cache event] LLC-store-misses [Hardware cache event] LLC-stores [Hardware cache event] branch-load-misses [Hardware cache event] branch-loads [Hardware cache

how to interpret perf iTLB-loads,iTLB-load-misses

阅读更多关于 how to interpret perf iTLB-loads,iTLB-load-misses

问题 I have a test case to observe perf iTLB-loads,iTLB-load-misses by perf stat -e dTLB-loads,dTLB-load-misses,iTLB-loads,iTLB-load-misses -p 22479 and get the output : Performance counter stats for process id '22479': 1,262,817 dTLB-loads 13,950 dTLB-load-misses # 1.10% of all dTLB cache hits 75 iTLB-loads 6,882 iTLB-load-misses # 9176.00% of all iTLB cache hits 3.999720948 seconds time elapsed I have no idea how to interpret iTLB-loads only 75 but iTLB-load-misses 6,882 ?! lscpu showes : Intel

内存泄漏排查之：Show me your Memory

阅读更多关于内存泄漏排查之：Show me your Memory

　　java 语言有个神奇的地方，那就是你时不时会去关注下内存。（当然了，任何牛逼的同学都应该关注内存）　　今天我们就来这么场景吧：某应用运行了一段时间后，ecs监控报警了，内存比较高了，怎么办？随着时间的推移，发现内存越来越高（但是又不会打到100%），怎么办？　　凡事讲究证据，报警说内存紧张就紧张吗，还得自己去验一下。如何确认内存问题？这太重要了！以下是几种查看内存问题的方法：（爱信不信啊） 1. top 等查看系统内存概况　　top：内存去，按M按照内存大小排序，立马看到罪魁祸首。具体命令请参考网上资料。　　top简要使用方法如下：使用格式： top [-] [d] [p] [q] [c] [C] [S] [s] [n] 参数说明： d：指定每两次屏幕信息刷新之间的时间间隔。当然用户可以使用s交互命令来改变之。 p:通过指定监控进程ID来仅仅监控某个进程的状态。 q:该选项将使top没有任何延迟的进行刷新。如果调用程序有超级用户权限，那么top将以尽可能高的优先级运行。 S：指定累计模式。 s：使top命令在安全模式中运行。这将去除交互命令所带来的潜在危险。 i：使top不显示任何闲置或者僵死进程。 c:显示整个命令行而不只是显示命令名。常用命令说明： Ctrl+L：擦除并且重写屏幕 K：终止一个进程。系统将提示用户输入需要终止的进程PID

Understanding the impact of lfence on a loop with two long dependency chains, for increasing lengths

阅读更多关于 Understanding the impact of lfence on a loop with two long dependency chains, for increasing lengths

I was playing with the code in this answer , slightly modifying it: BITS 64 GLOBAL _start SECTION .text _start: mov ecx, 1000000 .loop: ;T is a symbol defined with the CLI (-DT=...) TIMES T imul eax, eax lfence TIMES T imul edx, edx dec ecx jnz .loop mov eax, 60 ;sys_exit xor edi, edi syscall Without the lfence I the results I get are consistent with the static analysis in that answer. When I introduce a single lfence I'd expect the CPU to execute the imul edx, edx sequence of the k-th iteration in parallel with the imul eax, eax sequence of the next ( k+1-th ) iteration. Something like this

Unknown events in nodejs/v8 flamegraph using perf_events

阅读更多关于 Unknown events in nodejs/v8 flamegraph using perf_events

I try to do some nodejs profiling using Linux perf_events as described by Brendan Gregg here . Workflow is following: run node >0.11.13 with --perf-basic-prof , which creates /tmp/perf-(PID).map file where JavaScript symbol mapping are written. Capture stacks using perf record -F 99 -p `pgrep -n node` -g -- sleep 30 Fold stacks using stackcollapse-perf.pl script from this repository Generate svg flame graph using flamegraph.pl script I get following result (which look really nice at the beginning): Problem is that there are a lot of [unknown] elements, which I suppose should be my nodejs

Understanding the impact of lfence on a loop with two long dependency chains, for increasing lengths

阅读更多关于 Understanding the impact of lfence on a loop with two long dependency chains, for increasing lengths

问题 I was playing with the code in this answer, slightly modifying it: BITS 64 GLOBAL _start SECTION .text _start: mov ecx, 1000000 .loop: ;T is a symbol defined with the CLI (-DT=...) TIMES T imul eax, eax lfence TIMES T imul edx, edx dec ecx jnz .loop mov eax, 60 ;sys_exit xor edi, edi syscall Without the lfence I the results I get are consistent with the static analysis in that answer. When I introduce a single lfence I\'d expect the CPU to execute the imul edx, edx sequence of the k-th