perf

Why does Perf and Papi give different values for L3 cache references and misses?

╄→尐↘猪︶ㄣ 提交于 2019-12-03 02:12:32
I am working on a project where we have to implement an algorithm that is proven in theory to be cache friendly. In simple terms, if N is the input and B is the number of elements that get transferred between the cache and the RAM every time we have a cache miss, the algorithm will require O(N/B) accesses to the RAM. I would like to show that this is indeed the behavior in practice. To better understand how one can measure various cache related hardware counters, I decided to use different tools. One is Perf and the other is the PAPI library. Unfortunately, the more I work with these tools,

Is there a way to set kptr_restrict to 0?

五迷三道 提交于 2019-12-03 02:07:38
I am currently having trouble running linux perf, mostly because /proc/sys/kernel/kptr_restrict is currently set to 1. However, if I try to /proc/sys/kernel/kptr_restrict by echoing 0 to it as follows... echo 0 > /proc/sys/kernel/kptr_restrict I get a permission denied error. I don't think I can change permissions on it either. Is there a way to set this directly somehow? I am super user. I don't think perf will function acceptably without this being set. Jun Ge In your example, echo is running as root, but your shell is running as you. So please try this command: sudo sh -c " echo 0 > /proc

How can I get perf to find symbols in my program

守給你的承諾、 提交于 2019-12-03 01:43:26
问题 When using perf report , I don't see any symbols for my program, instead I get output like this: $ perf record /path/to/racket ints.rkt 10000 $ perf report --stdio # Overhead Command Shared Object Symbol # ........ ........ ................. ...... # 70.06% ints.rkt [unknown] [.] 0x5f99b8 26.28% ints.rkt [kernel.kallsyms] [k] 0xffffffff8103d0ca 3.66% ints.rkt perf-32046.map [.] 0x7f1d9be46650 Which is fairly uninformative. The relevant program is built with debugging symbols, and the sysprof

Use perf inside a docker container without --privileged

匿名 (未验证) 提交于 2019-12-03 01:33:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I am trying to use the perf tool inside a Docker container to record a given command. kernel.perf_event_paranoid is set to 1, but the container behaves just as if it where 2, when I don't put the --privileged flag. I could use --privileged, but the code I am running perf on is not trusted and if I am OK with taking a slight security risk by allowing perf tool, giving privileged rights on the container seems a different level of risk. Is there any other way to use perf inside the container? ~$ docker version Client: Version: 17.03.1-ce API

Automapper - ReverseMap() does not perform mapping

匿名 (未验证) 提交于 2019-12-03 01:20:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I have the following 2 classes: public class ReferenceEngine { public Guid ReferenceEngineId { get; set; } public string Description { get; set; } public int Horsepower { get; set; } } public class Engine { public Guid Id { get; set; } public string Description { get; set; } public int Power { get; set; } } I am using automapper to perform a mapping from ReferenceEngine to Engine and vice versa. Notice that the properties ReferenceEngineId / Id and Horsepower / Power does not have the same name. The following mapping configuration works and

Profiling sleep times with perf

匿名 (未验证) 提交于 2019-12-03 01:14:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 由 翻译 强力驱动 问题: I was looking for a way to find out where my program spends time. I read the perf tutorial and tried to profile sleep times as it is described there. I wrote the simplest possible program to profile: #include <unistd.h> int main () { sleep ( 10 ); return 0 ; } then I executed it with perf: $ sudo perf record - e sched : sched_stat_sleep - e sched : sched_switch - e sched : sched_process_exit - g - o ~ /perf.data.raw ./ a . out [ perf record : Woken up 1 times to write data ] [ perf record : Captured and wrote 0.013 MB / home /

perf: strange relation between software events

匿名 (未验证) 提交于 2019-12-03 00:44:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: Okay, so this really bugs me. I'm using perf to record the cpu-clock event (a software event): $ > perf record -e cpu-clock srun -n 1 ./stream ... and the table produced by perf report is empty. I'm using perf to record all available software events listed in perf list: $ > perf record -e alignment-faults,context-switches,cpu-clock,cpu-migrations,\ dummy,emulation-faults,major-faults,minor-faults,page-faults,task-clock\ srun -n 1 ./stream ... the table gives me a list of available samples: 0 alignment-faults 125 context-switches 255 cpu

微软开源可解释机器学习框架 interpret 学习实践

人走茶凉 提交于 2019-12-02 23:12:07
机器学习、深度学习往往给人一种黑盒的感觉,也就是它所表现出来的可解释性程度不高或者是很低,这就给学习使用带来了影响,如果能够对于机器学习的结果进行更好的解释那将会是很棒的。 今天基于微软开源的可解释机器学习框架interpret进行简单的学习实践,主要是想上手我刚刚配置好的jupyter环境来跑一波代码,下面先给出来GitHub地址,在 这里 。 使用基本的要求是python版本需要在3.5以上,在这里我正好使用的是3.6和kernel来进行实验的。 interpret的安装很简单,命令如下: pip install numpy scipy pyscaffold pip install -U interpret 安装方法虽然简单,但是安装的过程我个人觉得是比较漫长的,可能是我本地很多依赖的包版本比较低的缘故吧,在安装的过程中有10几个包都被卸载然后重新安装了新的版本了。 安装结束后我们就开始进行简单的实践【以波士顿房价数据为例】: 首先对数据集进行简单的探索可视化: 结果如下: 我们可以从summary的下拉框中选择不同的属性进行展示: 比如:这里我们选择第一个,结果如下: 接下来导入回归模型: 查看一下全局可解释性: 同样可以在下拉框中选择不同的信息进行查看,这里同样以第一个为例进行说明如下: 接下来查看一下局部的解释性: 我个人觉得这里还是很重要的

机器学习(十七)Microsoft的InterpretM可解释性 机器学习模型

纵饮孤独 提交于 2019-12-02 23:11:28
InterpretML 简介 适合可解释的模型 解释blackbox机器学习,可视化的展示“黑箱"机器学习 InterpretML是一个开源软件包,用于训练可解释的模型并解释黑盒系统。可解释性主要表现在以下几点: 模型调试 - 模型哪里出现了错误? 检测偏差 - 模型表现出哪些区分能力? 策略学习 - 模型是否满足某些规则要求? 高风险的应用 - 医疗保健,金融,司法等 从历史上看,最容易理解的模型不是很准确,最准确的模型是不可理解的。 Microsoft Research开发了一种称为可解释增强机Explainable Boosting Machine(EBM)的算法,该算法具有高精度和可懂度。 EBM使用现代机器学习技术,如装袋和助推,为传统GAM(Generalized Additive Models)注入新的活力。 这使它们像随机森林和梯度提升树一样准确,并且还增强了它们的可懂度和可编辑性。 image 除了EBM之外,InterpretML还支持LIME,SHAP,线性模型,部分依赖,决策树和规则列表等方法。该软件包可以轻松比较和对比模型,以找到最适合您需求的模型。 安装 Python 3.5+ | Linux, Mac OS X, Windows pip install numpy scipy pyscaffold pip install -U interpret

Getting user-space stack information from perf

大兔子大兔子 提交于 2019-12-02 17:44:42
I'm currently trying to track down some phantom I/O in a PostgreSQL build I'm testing. It's a multi-process server and it isn't simple to associate disk I/O back to a particular back-end and query. I thought Linux's perf tool would be ideal for this, but I'm struggling to capture block I/O performance counter metrics and associate them with user-space activity. It's easy to record block I/O requests and completions with, eg: sudo perf record -g -T -u postgres -e 'block:block_rq_*' and the user-space pid is recorded, but there's no kernel or user-space stack captured, or ability to snapshot