perf

Building Perf with Babeltrace (for Perf to CTF Conversion)

风格不统一 提交于 2019-12-03 20:14:36
I am trying to use TraceCompass in order to further investigate my system trace. For that purpose, you need CTF format and there are two possible ways to obtain it in Linux, afaik: Using LTTng for tracing and using CTF format from that Using 'perf data convert' in order to create CTF data from perf.data I have been trying to use the second option as the first one requires installation of tracepoints and what I got from perf is simply enough for me. So assuming I have my perf.data available, Applying perf data convert --to-ctf=./ctf resulted in: No version support compiled in. Digging into the

Why does Perf and Papi give different values for L3 cache references and misses?

余生颓废 提交于 2019-12-03 12:47:41
问题 I am working on a project where we have to implement an algorithm that is proven in theory to be cache friendly. In simple terms, if N is the input and B is the number of elements that get transferred between the cache and the RAM every time we have a cache miss, the algorithm will require O(N/B) accesses to the RAM. I would like to show that this is indeed the behavior in practice. To better understand how one can measure various cache related hardware counters, I decided to use different

Thread Utilization profiling on linux

∥☆過路亽.° 提交于 2019-12-03 11:58:35
问题 Linux perf-tools are great for finding hotspots in CPU cycles and optimizing those hotspots. But once some parts are parallelized it becomes difficult to spot the sequential parts since they take up significant wall time but not necessarily many CPU cycles (the parallel parts are already burning those). To avoid the XY-problem: My underlying motivation is to find sequential bottlenecks in multi-threaded code. The parallel phases can easily dominate the aggregate CPU-cycle statistics even

Best proven ways to improve search performance [closed]

匿名 (未验证) 提交于 2019-12-03 10:24:21
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: We are working on a website like www.justdial.com . Every State has Districts which have Cities which have Categories and subcategories. Kindly suggest proven ways to improve performance. 回答1: Add tags Create shorter standalone static index for popular queries. Create prepared static cached content for popular queries. Move (partially sort) popular queries to the beginning of the index. Optimize SQL queries with LIMIT, reduce number of JOINs and reduce number of string functions 文章来源: Best proven ways to improve search performance [closed]

How to catch the L3-cache hits and misses by perf tool in Linux

南笙酒味 提交于 2019-12-03 10:08:05
Is there any way to catch the L3-cache hits and misses by perf tool in Linux. According to the output of perf list cache , L1 and LLC cache are supported. According to the definition of perf_evsel__hw_cache array in perf's source code: const char *perf_evsel__hw_cache[PERF_COUNT_HW_CACHE_MAX] [PERF_EVSEL__MAX_ALIASES] = { { "L1-dcache", "l1-d", "l1d", "L1-data", }, { "L1-icache", "l1-i", "l1i", "L1-instruction", }, { "LLC", "L2", }, { "dTLB", "d-tlb", "Data-TLB", }, { "iTLB", "i-tlb", "Instruction-TLB", }, { "branch", "branches", "bpu", "btb", "bpc", }, { "node", }, }; LLC is an alias to L2

ARM compilation error, VFP registered used by executable, not object file

匿名 (未验证) 提交于 2019-12-03 09:05:37
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I have been having this problem for the last few days and I can't get my head around what is really happening here, or what is the problem. I have a makefile with these flags: CC = arm-linux-gnueabihf-gcc-4.6 FLAGS = -O3 -march=armv7-a -mtune=cortex-a9 -mfpu=neon -ftree-vectorize -mfloat-abi=softfp -std=gnu99 I have a library in a .a file, which has some object files, all I need to do is link them in with my executable. I know the prototypes and all that, the only thing that complains is the following: /usr/bin/ld: error: *EXECUTABLE* uses

how to change perf_event_open max sample rate

匿名 (未验证) 提交于 2019-12-03 08:57:35
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I'm using perf_event_open to get samples. I try to get everyone hit of point. But perf_event_open is not fast enough. I try to change the sample rate using below command: echo 10000000 > /proc/sys/kernel/perf_event_max_sample_rate But it looks like the value I set was too large. After running my code, perf_event_max_sample_rate is change back to a lower value such as 12500. And when I try to change bigger value,for example 20000000,50000000 and so on, the sample speed is not increased as value I changed to. Is there any way to change perf

ARM Cortex-a9 event counters return 0

匿名 (未验证) 提交于 2019-12-03 08:52:47
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I'm currently trying to use the event counters on an ARM Cortex-a9 (on a Xilinx zynq EPP) to count cycles. I've adapted some ARM example code from ARM for this purpose. I'm programming this bare-metal with the GNU ARM EABI compiler. The way I understand the use of the PMU is that you first have to enable the PMU. void enable_pmu (void){ asm volatile( "MRC p15, 0, r0, c9, c12, 0\n\t" "ORR r0, r0, #0x01\n\t" "MCR p15, 0, r0, c9, c12, 0\n\t" ); } then you configure the performance counter to count a certain type of event ( 0x11 for cycles in

perf mem -D report

匿名 (未验证) 提交于 2019-12-03 07:50:05
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I was using perf mem -t load record "commands" to profile system memory access latency. After, I run perf mem -D report and I got the following results: [root@mdtm-server wenji]# perf mem -D report # PID, TID, IP, ADDR, LOCAL WEIGHT, DSRC, SYMBOL 2054 2054 0xffffffff811186bf 0x016ffffe8fbffc804b0 49 0x68100842 /lib/modules/3.12.23/build/vmlinux:perf_event_aux_ctx 2054 2054 0xffffffff81321d6e 0xffff880c7fc87d44 7 0x68100142 /lib/modules/3.12.23/build/vmlinux:ghes_copy_tofrom_phys What does "ADDR", "DSRC", "SYMBOL" mean? 回答1: IP - PC of the

How do you get debugging symbols working in linux perf tool inside Docker containers?

夙愿已清 提交于 2019-12-03 06:57:12
I am using Docker containers based on the "ubuntu" tag and cannot get linux perf tool to display debugging symbols. Here is what I'm doing to demonstrate the problem. First I start a container, here with an interactive shell. docker run -t -i ubuntu:14.04 /bin/bash Then from the container prompt I install linux perf tool. apt-get update apt-get install -y linux-tools-common linux-tools-generic linux-tools-`uname -r` I can now use the perf tool. My kernel is 3.16.0-77-generic . Now I'll install gcc , compile a test program, and try to run it under perf record . apt-get install -y gcc I paste in