perf

Use perf inside a docker container without --privileged

徘徊边缘 提交于 2019-12-30 08:13:09
问题 I am trying to use the perf tool inside a Docker container to record a given command. kernel.perf_event_paranoid is set to 1, but the container behaves just as if it where 2, when I don't put the --privileged flag. I could use --privileged, but the code I am running perf on is not trusted and if I am OK with taking a slight security risk by allowing perf tool, giving privileged rights on the container seems a different level of risk. Is there any other way to use perf inside the container? ~$

PERF STAT does not count memory-loads but counts memory-stores

陌路散爱 提交于 2019-12-29 05:32:26
问题 Linux Kernel : 4.10.0-20-generic (also tried this on 4.11.3) Ubuntu : 17.04 I have been trying to collect stats of memory-accesses using perf stat . I am able to collect stats for memory-stores but the count for memory-loads return me a 0 value . The below is the details for memory-stores :- perf stat -e cpu/mem-stores/u ./libquantum_base.arnab 100 N = 100, 37 qubits required Random seed: 33 Measured 3277 (0.200012), fractional approximation is 1/5. Odd denominator, trying to expand by 2.

使用pprof 分析perf 数据

眉间皱痕 提交于 2019-12-25 16:08:48
对于perf 工具提供的指标数据,我们可以使用自带的report 以及script 进行查看,同时对于火焰图使用 flamescope 也挺不错,但是如果需要跨平台分析使用pprof结合perf_data_converter 就很方便了,以下 是一个简单的集成使用 perf_data_converter构建使用centos系统 安装perf_data_converter 这个需要构建工具的支持bazel,一些依赖 安装依赖 yum install -y elfutils-libelf-devel yum install -y libcap-devel clone 代码 git clone https://github.com/google/perf_data_converter.git cd perf_data_converter bazel build src:perf_to_profile 配置环境变量 添加perf_data_converter到path 路径 生成一个perf.data 数据 命令 perf record 转换perf.data 命令 perf_to_profile -i perf.data -o perf-convert 效果 perf_to_profile -i perf.data -o perf-convert [WARNING:src/quipper

FMA instruction showing up as three packed double operations?

China☆狼群 提交于 2019-12-23 19:03:34
问题 I'm analyzing a piece of linear algebra code which is calling intrinsics directly, e.g. v_dot0 = _mm256_fmadd_pd( v_x0, v_y0, v_dot0 ); My test script computes the dot product of two double precision vectors of length 4 (so only one call to _mm256_fmadd_pd needed), repeated 1 billion times. When I count the number of operations with perf I get something as follows: Performance counter stats for './main': 0 r5380c7 (skl::FP_ARITH:512B_PACKED_SINGLE) (49.99%) 0 r5340c7 (skl::FP_ARITH:512B

Why does Linux perf use event l1d.replacement for “L1 dcache misses” on x86?

余生颓废 提交于 2019-12-23 09:21:47
问题 On Intel x86, Linux uses the event l1d.replacements to implement its L1-dcache-load-misses event. This event is defined as follows: Counts L1D data line replacements including opportunistic replacements, and replacements that require stall-for-replace or block-for-replace. Perhaps naively, I would have expected perf to use something like mem_load_retired.l1_miss , which supports PEBS and is defined as: Counts retired load instructions with at least one uop that missed in the L1 cache.

error: perf.data file has no samples

只谈情不闲聊 提交于 2019-12-23 05:33:11
问题 I'm currently learning to use perf. I have output for hardware events, but not for software events like cpu-cycles or cpu-clock. I invoked perf with the verbose option: $ > perf record -v ./pi-serial-ps mmap size 528384B Reference Pi: 3.1415926536 Simulated Pi: 3.1415209778 [ perf record: Woken up 15 times to write data ] Looking at the vmlinux_path (7 entries long) Using /proc/kallsyms for symbols [ perf record: Captured and wrote 3.694 MB perf.data (96497 samples) ] Invoking perf record

error: perf.data file has no samples

和自甴很熟 提交于 2019-12-23 05:33:06
问题 I'm currently learning to use perf. I have output for hardware events, but not for software events like cpu-cycles or cpu-clock. I invoked perf with the verbose option: $ > perf record -v ./pi-serial-ps mmap size 528384B Reference Pi: 3.1415926536 Simulated Pi: 3.1415209778 [ perf record: Woken up 15 times to write data ] Looking at the vmlinux_path (7 entries long) Using /proc/kallsyms for symbols [ perf record: Captured and wrote 3.694 MB perf.data (96497 samples) ] Invoking perf record

Intel PMU event for L1 cache hit event

丶灬走出姿态 提交于 2019-12-22 15:37:13
问题 I'm trying to count the number of cache hit at different levels (L1, L2 and L3) of cache for a program on Intel Haswell processor. I wrote a program to count the number of L2 and L3 cache hits by monitoring the respective events. To achieve that, I checked Intel x86 Software Development Manual and used the cache_all_request event and cache_miss event for L2 and L3 cache. However, I didn't find the events for L1 cache. Maybe I missed something? My questions are: Which Event Number and UMASK

What does “perf stat” output mean?

坚强是说给别人听的谎言 提交于 2019-12-22 09:25:49
问题 I use " perf stat " command to do a statistic of some events: [root@root test]# perf stat -a -e "r81d0","r82d0" -v ./a r81d0: 71800964 1269047979 1269006431 r82d0: 26655201 1284214869 1284214869 Performance counter stats for './a': 71,800,964 r81d0 [100.00%] 26,655,201 r82d0 0.036892349 seconds time elapsed (1) I know 71800964 is the count of " r81d0 ", but what is the meaning of 1269047979 and 1269006431 ? (2) What is the meaning of " [100.00%] "? I have tried to " perf stat --help ", but

Perf startup overhead: Why does a simple static executable which performs MOV + SYS_exit have so many stalled cycles (and instructions)?

馋奶兔 提交于 2019-12-22 03:35:23
问题 I'm trying to understand how to measure performance and decided to write the very simple program: section .text global _start _start: mov rax, 60 syscall And I ran the program with perf stat ./bin The thing I was surprised by is the stalled-cycles-frontend was too high. 0.038132 task-clock (msec) # 0.148 CPUs utilized 0 context-switches # 0.000 K/sec 0 cpu-migrations # 0.000 K/sec 2 page-faults # 0.052 M/sec 107,386 cycles # 2.816 GHz 81,229 stalled-cycles-frontend # 75.64% frontend cycles