perf | 易学教程

softlockup/hardlockup原理详细介绍

阅读更多关于 softlockup/hardlockup原理详细介绍

转载自 https://blog.csdn.net/hzj_001/article/details/100054659 主体涉及到了3个机制：kernel watchodog线程，高精度定时器（时钟中断），基于PMU硬件perf event的NMI（不可屏蔽中断）。基本思想： 1.）（soft lockup）：抢占被长时间关闭而导致其余进程无法调度 2.）（hard lockup）：中断被长时间关闭而导致 softlockup基本原理： 1）SoftLockup 检测首先需要对每一个CPU core注册叫做watchdog的kernel线程。即[watchdog/0]，[watchdog/1]，[watchdog/2]… 2）同时，系统会有一个高精度的计时器hrtimer，该计时器能定期产生时钟中断，该中断对应的中断回调函数是watchdog_timer_fn()；此中断回调函数主要做3件事： a.watchdog_interrupt_count函数更新hrtimer_interrupts变量（判断hardlockup会用） b.wake_up_process唤醒watchdog线程（更新时间戳） c.is_softlockup判断是否出现了soft_lockup soft lock detector会检查时间戳，如果超过soft lockup threshold一直未更新，说明

Linux性能分析工具汇总合集

阅读更多关于 Linux性能分析工具汇总合集

出于对Linux操作系统的兴趣，以及对底层知识的强烈欲望，因此整理了这篇文章。本文也可以作为检验基础知识的指标，另外文章涵盖了一个系统的方方面面。如果没有完善的计算机系统知识，网络知识和操作系统知识，文档中的工具，是不可能完全掌握的，另外对系统性能分析和优化是一个长期的系列。本文档主要是结合Linux 大牛，Netflix 高级性能架构师 Brendan Gregg 更新 Linux 性能调优工具的博文，搜集Linux系统性能优化相关文章整理后的一篇综合性文章，主要是结合博文对涉及到的原理和性能测试的工具展开说明。背景知识：具备背景知识是分析性能问题时需要了解的。比如硬件 cache；再比如操作系统内核。应用程序的行为细节往往是和这些东西互相牵扯的，这些底层的东西会以意想不到的方式影响应用程序的性能，比如某些程序无法充分利用 cache，从而导致性能下降。比如不必要地调用过多的系统调用，造成频繁的内核 / 用户切换等。这里只是为本文的后续内容做一些铺垫，关于调优还有很多东西，我所不知道的比知道的要多的多，希望大家能共同学习进步。【性能分析工具】首先来看一张图：上图是Brendan Gregg 的一次性能分析的分享，这里面的所有工具都可以通过man来获得它的帮助文档，下问简单介绍介绍一下常规的用法： ▲ vmstat--虚拟内存统计 vmstat

Perf tool stat output: multiplex and scaling of “cycles”

阅读更多关于 Perf tool stat output: multiplex and scaling of “cycles”

问题 I am trying to understand the multiplex and scaling of "cycles" event in the "perf" output. The following is the output of perf tool: 144094.487583 task-clock (msec) # 1.017 CPUs utilized 539912613776 instructions # 1.09 insn per cycle (83.42%) 496622866196 cycles # 3.447 GHz (83.48%) 340952514 cache-misses # 10.354 % of all cache refs (83.32%) 3292972064 cache-references # 22.854 M/sec (83.26%) 144081.898558 cpu-clock (msec) # 1.017 CPUs utilized 4189372 page-faults # 0.029 M/sec 0 major

Understanding the perf report

阅读更多关于 Understanding the perf report

问题 I had been working on some time-sensitive project. Because of some undesired spikes in the timing, I had to go a bit deeper. Scenario : I have a kernel module, which is pinned to a CPU core. This CPU core is also listed in isolcpus in the kernel boot parameters. Here's what I have done to kernel boot parameters in cmdline intel_iommu=on iommu=pt default_hugepagesz=1G hugepagesz=1G hugepages=1 intel_idle.max_cstate=0 processor.max_cstate=0 nohz_full=7-11 isolcpus=7-11 mce=off rcu_nocbs=7-11

perf: strange relation between software events

阅读更多关于 perf: strange relation between software events

问题 Okay, so this really bugs me. I'm using perf to record the cpu-clock event (a software event): $ > perf record -e cpu-clock srun -n 1 ./stream ... and the table produced by perf report is empty. I'm using perf to record all available software events listed in perf list: $ > perf record -e alignment-faults,context-switches,cpu-clock,cpu-migrations,\ dummy,emulation-faults,major-faults,minor-faults,page-faults,task-clock\ srun -n 1 ./stream ... the table gives me a list of available samples: 0

system call hardware performance counters ubuntu

阅读更多关于 system call hardware performance counters ubuntu

问题 I am working on a project and I would like to obtain the performance counters(cache, TLB, etc) values of a system call(eg: read()) before and after the execution of a file. I tried doing this using perf on Ubuntu but was not able to get any results. Is there a way to do it using perf or maybe some other tool ? Thanks for the help. 3.329057 task-clock (msec) # 0.714 CPUs utilized 16 context-switches # 0.005 M/sec 0 cpu-migrations # 0.000 K/sec 257 page-faults # 0.077 M/sec 1,983,212 cycles # 0

How to get perf_event results for 2nd Nexus7 with Krait CPU

阅读更多关于 How to get perf_event results for 2nd Nexus7 with Krait CPU

问题 all. I try to get PMUs information such as Instructions, Cycle, Cache miss and etc. on 2nd Nexus7 with Krait CPU. The perf tool is not working correctly. Therefore, I am using follow a sample source code in perf_event tutorials. #include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <string.h> #include <sys/ioctl.h> #include <linux/perf_event.h> #include <asm/unistd.h> static long perf_event_open(struct perf_event_attr *hw_event, pid_t pid, int cpu, int group_fd, unsigned long

嵩天老师用蒙特卡罗方法和python求圆周率详细代码解析

阅读更多关于嵩天老师用蒙特卡罗方法和python求圆周率详细代码解析

from time import perf_counter from random import random start=perf_counter() x=10 sum=0 while x>0: zheng= 10000*10000 #相当于正方形的面积 yuan=0 #相当于圆的面积 for i in range(1,zheng): a,b=random(),random() r=pow(a**2+b**2,0.5) #用勾股定理求到圆心的距离（圆的半径） if r<=1: yuan = yuan+1 pi=4*(yuan/zheng) x=x-1 sum = sum +pi print('10次平均值：π={:.7f}'.format(sum/10)) print('亿次循环耗时：{:.2f}秒'.format(perf_counter()-start)) 来源： CSDN 作者： qqfushi 链接： https://blog.csdn.net/qqfushi/article/details/103805000

Can I get the python call stack with the linux perf?

阅读更多关于 Can I get the python call stack with the linux perf?

问题 For example, def test(): print "test" I used perf record -g -p $pid , but the result was just all about PyEval_EvalFrameEx . How can I get the real name "test" or if can not by using perf? 回答1: As of 2018, perf simply doesn't have support for reading the Python stack frames (cf. a 2014 Python mailinglist discussion). Python 3.6 has some support for Dtrace and Systemtap. An alternative to this is Pyflame, a stochastic profiler for Python that samples python call stacks via ptrace() . In

Logging all memory accesses of any executable/process in Linux

阅读更多关于 Logging all memory accesses of any executable/process in Linux

问题 I have been looking for a way to log all memory accesses of a process/execution in Linux. I know there have been questions asked on this topic previously here like this Logging memory access footprint of whole system in Linux But I wanted to know if there is any non-instrumentation tool that performs this activity. I am not looking for QEMU/ VALGRIND for this purpose since it would be a bit slow and I want as little overhead as possible. I looked at perf mem and PEBS events like cpu/mem-loads