perf

Two TLB-miss per mmap/access/munmap

旧时模样 提交于 2019-12-07 00:55:23
问题 for (int i = 0; i < 100000; ++i) { int *page = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); page[0] = 0; munmap(page, PAGE_SIZE); } I expect to get ~100000 dTLB-store-misses in userspace, one per each iteration (Also ~100000 page-faults and dTLB-load-misses for kernel). Running following command, the result is roughly 2x what I expect. I would appreciate if someone could clarify why this is the case: perf stat -e dTLB-store-misses:u ./test Performance

高并发性能调试经验分享

ε祈祈猫儿з 提交于 2019-12-06 22:06:10
引文 4月份的时候看到一道面试题,据说是腾讯校招面试官提的:在多线程和高并发环境下,如果有一个平均运行一百万次才出现一次的bug,你如何调试这个bug?知乎原贴地址如下: 腾讯实习生面试,这两道题目该怎么回答? - 编程 . 遗憾的是知乎很多答案在抨击这道题本身的正确性,虽然我不是这次的面试官,但我认为这是一道非常好的面试题。当然,只是道加分题,答不上,不扣分。答得不错,说明解决问题的思路和能力要超过应届生平均水平。 之所以写上面这段,是因为我觉得大部分后台服务端开发都有可能遇到这样的BUG,即使没有遇到,这样的题目也能够激发大家不断思考和总结。非常凑巧的是,我在4月份也遇到了一个类似的而且要更加严重的BUG,这是我自己挖的一个很深的坑,不填好,整个项目就无法上线。 现在已经过去了一个多月,趁着有时间,自己好好总结一下,希望里面提到的一些经验和工具能够带给大家一点帮助。 项目背景 我们针对nginx事件框架和openssl协议栈进行了一些深度改造,以提升nginx的HTTPS完全握手计算性能。 由于原生nginx使用本地CPU做RSA计算,ECDHE_RSA算法的单核处理能力只有400 qps左右。前期测试时的并发性能很低,就算开了24核,性能也无法超过1万。 核心功能在去年底就完成了开发,线下测试也没有发现问题。经过优化后的性能提升几倍,为了测试最大性能

system call hardware performance counters ubuntu

痞子三分冷 提交于 2019-12-06 21:55:38
I am working on a project and I would like to obtain the performance counters(cache, TLB, etc) values of a system call(eg: read()) before and after the execution of a file. I tried doing this using perf on Ubuntu but was not able to get any results. Is there a way to do it using perf or maybe some other tool ? Thanks for the help. 3.329057 task-clock (msec) # 0.714 CPUs utilized 16 context-switches # 0.005 M/sec 0 cpu-migrations # 0.000 K/sec 257 page-faults # 0.077 M/sec 1,983,212 cycles # 0.596 GHz 1,352,902 stalled-cycles-frontend # 68.22% frontend cycles idle 1,080,180 stalled-cycles

Which perf events can use PEBS?

ぃ、小莉子 提交于 2019-12-06 08:24:09
问题 I want to understand which events can have the precise modifier on my CPU (Sandy Bridge). Intel Software Developer's Manual (Table 18-32. PEBS Performance Events for Intel Microarchitecture Code Name Sandy Bridge) contains only the following events: INST_RETIRED , UOPS_RETIRED , BR_INST_RETIRED , BR_MISP_RETIRED , MEM_UOPS_RETIRED , MEM_LOAD_UOPS_RETIRED , MEM_LOAD_UOPS_LLC_HIT_RETIRED . And SandyBridge_core_V15.json lists the same events with PEBS > 0. However there are some examples of

perf ,比较好的一个程序性能测试工具

China☆狼群 提交于 2019-12-06 07:15:37
面对一个问题程序,最好采用自顶向下的策略。先整体看看该程序运行时各种统计事件的大概,再针对某些方向深入细节。而不要一下子扎进琐碎细节,会一叶障目的。 对于优化自己写的代码,cpu bound 型 和 IO bound 型是不一样的: cpu bound 型:所谓cpu bound型指的是程序大部分时间都在使用CPU。 IO bound 型:由cpu bound型的定义就不难推出了。 perf stat 命令用于统计进程总体的信息 /*******************************************************************************/ $ perf stat ./Joseph_ring Performance counter stats for './Joseph_ring': 19.755435 task-clock # 0.000 CPUs utilized 429 context-switches # 0.022 M/sec 5 CPU-migrations # 0.000 M/sec 137 page-faults # 0.007 M/sec 27,255,530 cycles # 1.380 GHz <not counted> stalled-cycles-frontend <not counted> stalled

perf mem -D report

和自甴很熟 提交于 2019-12-06 07:06:58
问题 I was using perf mem -t load record "commands" to profile system memory access latency. After, I run perf mem -D report and I got the following results: [root@mdtm-server wenji]# perf mem -D report # PID, TID, IP, ADDR, LOCAL WEIGHT, DSRC, SYMBOL 2054 2054 0xffffffff811186bf 0x016ffffe8fbffc804b0 49 0x68100842 /lib/modules/3.12.23/build/vmlinux:perf_event_aux_ctx 2054 2054 0xffffffff81321d6e 0xffff880c7fc87d44 7 0x68100142 /lib/modules/3.12.23/build/vmlinux:ghes_copy_tofrom_phys What does

Logging Memory Access Footprint

。_饼干妹妹 提交于 2019-12-06 07:01:33
问题 I found mtrace by Dr.Clements. Although it is useful, it doesn't work normally in the situation I need. I intend to use the record to understand memory access pattern in different scenario. Can someone share the related experience? Any suggestion will be appreciated. 0313 Updated : I'm trying to use qemu-mtrace to boot ubuntu 16.04 with linux-mtrace(3.8.0), but it only show several error message and terminated. Hope some tool be able to log every access. $ ./qemu-system-x86_64 -mtrace-enable

perf-tools 简单试用

南笙酒味 提交于 2019-12-06 06:30:12
per-tools 是性能优化大师brendan gregg 就有perf 以及ftrace 编写的性能优化工具集 提供了io 、网络、系统调用。。。大部分方面的性能分析工具。 一张参考图 安装 clone 代码 git clone --depth 1 https://github.com/brendangregg/perf-tools 基本使用 查看io 延迟 ./iolatency -Q 效果 ./iolatency -Q Tracing block I/O. Output every 1 seconds. Ctrl-C to end. ​ >=(ms) .. <(ms) : I/O |Distribution | 0 -> 1 : 0 | | 1 -> 2 : 0 | | 2 -> 4 : 0 | | 4 -> 8 : 0 | | 8 -> 16 : 2 |######################################| ​ >=(ms) .. <(ms) : I/O |Distribution | 0 -> 1 : 0 | | ​ >=(ms) .. <(ms) : I/O |Distribution | 0 -> 1 : 0 | | ​ >=(ms) .. <(ms) : I/O |Distribution | 0 -> 1 : 0 | | ^C 说明

Event-based sampling with the perf userland tool and PEBS

牧云@^-^@ 提交于 2019-12-06 05:31:37
问题 I'm doing event-based sampling with the perf userland tool: the objective being trying to find out where certain performance-impacting events like branch misses and cache misses are occurring on a larger system I'm working on. Now, something like perf record -a -e branch-misses:pp -- sleep 5 works perfectly: the PEBS counting mode trigerred by the 'pp' modifier is really accurate when collecting the IP in the samples. Unfortunately, when I try to do the same for cache-misses, i.e. perf record

Linux perf command for cache references

こ雲淡風輕ζ 提交于 2019-12-06 04:47:28
问题 I want to measure cache miss rate of my code. We can use perf list to show supported the events. My desktop has a Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz processor, the perf list contains cache-refrences, and cache-misses, like this: cpu-cycles OR cycles [Hardware event] stalled-cycles-frontend OR idle-cycles-frontend [Hardware event] stalled-cycles-backend OR idle-cycles-backend [Hardware event] instructions [Hardware event] cache-references [Hardware event] cache-misses [Hardware event] I