perf | 易学教程

Two TLB-miss per mmap/access/munmap

阅读更多关于 Two TLB-miss per mmap/access/munmap

问题 for (int i = 0; i < 100000; ++i) { int *page = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); page[0] = 0; munmap(page, PAGE_SIZE); } I expect to get ~100000 dTLB-store-misses in userspace, one per each iteration (Also ~100000 page-faults and dTLB-load-misses for kernel). Running following command, the result is roughly 2x what I expect. I would appreciate if someone could clarify why this is the case: perf stat -e dTLB-store-misses:u ./test Performance

高并发性能调试经验分享

阅读更多关于高并发性能调试经验分享

引文 4月份的时候看到一道面试题，据说是腾讯校招面试官提的：在多线程和高并发环境下，如果有一个平均运行一百万次才出现一次的bug，你如何调试这个bug？知乎原贴地址如下：腾讯实习生面试，这两道题目该怎么回答？ - 编程 . 遗憾的是知乎很多答案在抨击这道题本身的正确性，虽然我不是这次的面试官，但我认为这是一道非常好的面试题。当然，只是道加分题，答不上，不扣分。答得不错，说明解决问题的思路和能力要超过应届生平均水平。之所以写上面这段，是因为我觉得大部分后台服务端开发都有可能遇到这样的BUG，即使没有遇到，这样的题目也能够激发大家不断思考和总结。非常凑巧的是，我在4月份也遇到了一个类似的而且要更加严重的BUG，这是我自己挖的一个很深的坑，不填好，整个项目就无法上线。现在已经过去了一个多月，趁着有时间，自己好好总结一下，希望里面提到的一些经验和工具能够带给大家一点帮助。项目背景我们针对nginx事件框架和openssl协议栈进行了一些深度改造，以提升nginx的HTTPS完全握手计算性能。由于原生nginx使用本地CPU做RSA计算，ECDHE_RSA算法的单核处理能力只有400 qps左右。前期测试时的并发性能很低，就算开了24核，性能也无法超过1万。核心功能在去年底就完成了开发，线下测试也没有发现问题。经过优化后的性能提升几倍，为了测试最大性能

system call hardware performance counters ubuntu

阅读更多关于 system call hardware performance counters ubuntu

I am working on a project and I would like to obtain the performance counters(cache, TLB, etc) values of a system call(eg: read()) before and after the execution of a file. I tried doing this using perf on Ubuntu but was not able to get any results. Is there a way to do it using perf or maybe some other tool ? Thanks for the help. 3.329057 task-clock (msec) # 0.714 CPUs utilized 16 context-switches # 0.005 M/sec 0 cpu-migrations # 0.000 K/sec 257 page-faults # 0.077 M/sec 1,983,212 cycles # 0.596 GHz 1,352,902 stalled-cycles-frontend # 68.22% frontend cycles idle 1,080,180 stalled-cycles

Which perf events can use PEBS?

阅读更多关于 Which perf events can use PEBS?

问题 I want to understand which events can have the precise modifier on my CPU (Sandy Bridge). Intel Software Developer's Manual (Table 18-32. PEBS Performance Events for Intel Microarchitecture Code Name Sandy Bridge) contains only the following events: INST_RETIRED , UOPS_RETIRED , BR_INST_RETIRED , BR_MISP_RETIRED , MEM_UOPS_RETIRED , MEM_LOAD_UOPS_RETIRED , MEM_LOAD_UOPS_LLC_HIT_RETIRED . And SandyBridge_core_V15.json lists the same events with PEBS > 0. However there are some examples of

perf ，比较好的一个程序性能测试工具

阅读更多关于 perf ，比较好的一个程序性能测试工具

面对一个问题程序，最好采用自顶向下的策略。先整体看看该程序运行时各种统计事件的大概，再针对某些方向深入细节。而不要一下子扎进琐碎细节，会一叶障目的。对于优化自己写的代码，cpu bound 型和 IO bound 型是不一样的： cpu bound 型：所谓cpu bound型指的是程序大部分时间都在使用CPU。 IO bound 型：由cpu bound型的定义就不难推出了。 perf stat 命令用于统计进程总体的信息 /*******************************************************************************/ $ perf stat ./Joseph_ring Performance counter stats for './Joseph_ring': 19.755435 task-clock # 0.000 CPUs utilized 429 context-switches # 0.022 M/sec 5 CPU-migrations # 0.000 M/sec 137 page-faults # 0.007 M/sec 27,255,530 cycles # 1.380 GHz <not counted> stalled-cycles-frontend <not counted> stalled

perf mem -D report

阅读更多关于 perf mem -D report

问题 I was using perf mem -t load record "commands" to profile system memory access latency. After, I run perf mem -D report and I got the following results: [root@mdtm-server wenji]# perf mem -D report # PID, TID, IP, ADDR, LOCAL WEIGHT, DSRC, SYMBOL 2054 2054 0xffffffff811186bf 0x016ffffe8fbffc804b0 49 0x68100842 /lib/modules/3.12.23/build/vmlinux:perf_event_aux_ctx 2054 2054 0xffffffff81321d6e 0xffff880c7fc87d44 7 0x68100142 /lib/modules/3.12.23/build/vmlinux:ghes_copy_tofrom_phys What does

Logging Memory Access Footprint

阅读更多关于 Logging Memory Access Footprint

问题 I found mtrace by Dr.Clements. Although it is useful, it doesn't work normally in the situation I need. I intend to use the record to understand memory access pattern in different scenario. Can someone share the related experience? Any suggestion will be appreciated. 0313 Updated : I'm trying to use qemu-mtrace to boot ubuntu 16.04 with linux-mtrace(3.8.0), but it only show several error message and terminated. Hope some tool be able to log every access. $ ./qemu-system-x86_64 -mtrace-enable

perf-tools 简单试用

阅读更多关于 perf-tools 简单试用

per-tools 是性能优化大师brendan gregg 就有perf 以及ftrace 编写的性能优化工具集提供了io 、网络、系统调用。。。大部分方面的性能分析工具。一张参考图安装 clone 代码 git clone --depth 1 https://github.com/brendangregg/perf-tools 基本使用查看io 延迟 ./iolatency -Q 效果 ./iolatency -Q Tracing block I/O. Output every 1 seconds. Ctrl-C to end. >=(ms) .. <(ms) : I/O |Distribution | 0 -> 1 : 0 | | 1 -> 2 : 0 | | 2 -> 4 : 0 | | 4 -> 8 : 0 | | 8 -> 16 : 2 |######################################| >=(ms) .. <(ms) : I/O |Distribution | 0 -> 1 : 0 | | >=(ms) .. <(ms) : I/O |Distribution | 0 -> 1 : 0 | | >=(ms) .. <(ms) : I/O |Distribution | 0 -> 1 : 0 | | ^C 说明

Event-based sampling with the perf userland tool and PEBS

阅读更多关于 Event-based sampling with the perf userland tool and PEBS

问题 I'm doing event-based sampling with the perf userland tool: the objective being trying to find out where certain performance-impacting events like branch misses and cache misses are occurring on a larger system I'm working on. Now, something like perf record -a -e branch-misses:pp -- sleep 5 works perfectly: the PEBS counting mode trigerred by the 'pp' modifier is really accurate when collecting the IP in the samples. Unfortunately, when I try to do the same for cache-misses, i.e. perf record

Linux perf command for cache references

阅读更多关于 Linux perf command for cache references

问题 I want to measure cache miss rate of my code. We can use perf list to show supported the events. My desktop has a Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz processor, the perf list contains cache-refrences, and cache-misses, like this: cpu-cycles OR cycles [Hardware event] stalled-cycles-frontend OR idle-cycles-frontend [Hardware event] stalled-cycles-backend OR idle-cycles-backend [Hardware event] instructions [Hardware event] cache-references [Hardware event] cache-misses [Hardware event] I