perf

Measure page faults from a c program

非 Y 不嫁゛ 提交于 2019-11-27 01:41:49
问题 I am comparing a few system calls where I read/write from/to memory. Is there any API defined to measure page faults (pages in/out) in C ? I found this library libperfstat.a but it is for AIX , I couldn't find anything for linux. Edit: I am aware of time & perf-stat commands in linux, just exploring if there is anything available for me to use inside the C program. 回答1: There is getrusage function (SVr4, 4.3BSD. POSIX.1-2001; but not all fields are defined in standard). In linux there are

linux perf: how to interpret and find hotspots

大兔子大兔子 提交于 2019-11-26 23:56:11
I tried out linux' perf utility today and am having trouble in interpreting its results. I'm used to valgrind's callgrind which is of course a totally different approach to the sampling based method of perf. What I did: perf record -g -p $(pidof someapp) perf report -g -n Now I see something like this: + 16.92% kdevelop libsqlite3.so.0.8.6 [.] 0x3fe57 ↑ + 10.61% kdevelop libQtGui.so.4.7.3 [.] 0x81e344 ▮ + 7.09% kdevelop libc-2.14.so [.] 0x85804 ▒ + 4.96% kdevelop libQtGui.so.4.7.3 [.] 0x265b69 ▒ + 3.50% kdevelop libQtCore.so.4.7.3 [.] 0x18608d ▒ + 2.68% kdevelop libc-2.14.so [.] memcpy ▒ + 1

Hardware cache events and perf

♀尐吖头ヾ 提交于 2019-11-26 23:24:28
问题 When I run perf list I see a bunch of Hardware Cache Events , as follows: $ perf list | grep 'cache event' L1-dcache-load-misses [Hardware cache event] L1-dcache-loads [Hardware cache event] L1-dcache-stores [Hardware cache event] L1-icache-load-misses [Hardware cache event] LLC-load-misses [Hardware cache event] LLC-loads [Hardware cache event] LLC-store-misses [Hardware cache event] LLC-stores [Hardware cache event] branch-load-misses [Hardware cache event] branch-loads [Hardware cache

how to interpret perf iTLB-loads,iTLB-load-misses

China☆狼群 提交于 2019-11-26 23:18:29
问题 I have a test case to observe perf iTLB-loads,iTLB-load-misses by perf stat -e dTLB-loads,dTLB-load-misses,iTLB-loads,iTLB-load-misses -p 22479 and get the output : Performance counter stats for process id '22479': 1,262,817 dTLB-loads 13,950 dTLB-load-misses # 1.10% of all dTLB cache hits 75 iTLB-loads 6,882 iTLB-load-misses # 9176.00% of all iTLB cache hits 3.999720948 seconds time elapsed I have no idea how to interpret iTLB-loads only 75 but iTLB-load-misses 6,882 ?! lscpu showes : Intel

内存泄漏排查之:Show me your Memory

会有一股神秘感。 提交于 2019-11-26 23:11:02
  java 语言有个神奇的地方,那就是你时不时会去关注下内存。(当然了,任何牛逼的同学都应该关注内存)   今天我们就来这么场景吧:某应用运行了一段时间后,ecs监控报警了,内存比较高了,怎么办?随着时间的推移,发现内存越来越高(但是又不会打到100%),怎么办?   凡事讲究证据,报警说内存紧张就紧张吗,还得自己去验一下。 如何确认内存问题?这太重要了! 以下是几种查看内存问题的方法:(爱信不信啊) 1. top 等查看系统内存概况   top:内存去,按M按照内存大小排序,立马看到罪魁祸首。具体命令请参考网上资料。   top简要使用方法如下: 使用格式: top [-] [d] [p] [q] [c] [C] [S] [s] [n] 参数说明: d:指定每两次屏幕信息刷新之间的时间间隔。当然用户可以使用s交互命令来改变之。 p:通过指定监控进程ID来仅仅监控某个进程的状态。 q:该选项将使top没有任何延迟的进行刷新。如果调用程序有超级用户权限,那么top将以尽可能高的优先级运行。 S:指定累计模式。 s:使top命令在安全模式中运行。这将去除交互命令所带来的潜在危险。 i:使top不显示任何闲置或者僵死进程。 c:显示整个命令行而不只是显示命令名。 常用命令说明: Ctrl+L:擦除并且重写屏幕 K:终止一个进程。系统将提示用户输入需要终止的进程PID

Understanding the impact of lfence on a loop with two long dependency chains, for increasing lengths

拈花ヽ惹草 提交于 2019-11-26 19:09:48
I was playing with the code in this answer , slightly modifying it: BITS 64 GLOBAL _start SECTION .text _start: mov ecx, 1000000 .loop: ;T is a symbol defined with the CLI (-DT=...) TIMES T imul eax, eax lfence TIMES T imul edx, edx dec ecx jnz .loop mov eax, 60 ;sys_exit xor edi, edi syscall Without the lfence I the results I get are consistent with the static analysis in that answer. When I introduce a single lfence I'd expect the CPU to execute the imul edx, edx sequence of the k-th iteration in parallel with the imul eax, eax sequence of the next ( k+1-th ) iteration. Something like this

Unknown events in nodejs/v8 flamegraph using perf_events

允我心安 提交于 2019-11-26 17:46:11
I try to do some nodejs profiling using Linux perf_events as described by Brendan Gregg here . Workflow is following: run node >0.11.13 with --perf-basic-prof , which creates /tmp/perf-(PID).map file where JavaScript symbol mapping are written. Capture stacks using perf record -F 99 -p `pgrep -n node` -g -- sleep 30 Fold stacks using stackcollapse-perf.pl script from this repository Generate svg flame graph using flamegraph.pl script I get following result (which look really nice at the beginning): Problem is that there are a lot of [unknown] elements, which I suppose should be my nodejs

Understanding the impact of lfence on a loop with two long dependency chains, for increasing lengths

时光总嘲笑我的痴心妄想 提交于 2019-11-26 06:48:22
问题 I was playing with the code in this answer, slightly modifying it: BITS 64 GLOBAL _start SECTION .text _start: mov ecx, 1000000 .loop: ;T is a symbol defined with the CLI (-DT=...) TIMES T imul eax, eax lfence TIMES T imul edx, edx dec ecx jnz .loop mov eax, 60 ;sys_exit xor edi, edi syscall Without the lfence I the results I get are consistent with the static analysis in that answer. When I introduce a single lfence I\'d expect the CPU to execute the imul edx, edx sequence of the k-th