profiling | 易学教程

Matlab tic toc accuracy

阅读更多关于 Matlab tic toc accuracy

问题 I'm measuring some code in loop fps = zeros(1, 100); for i=1:100 t = tic I = fetch_image_from_source(); % function to get image fps(i) = 1./ toc(t); end plot(fps); And I get average 50 fps. Then I'd like to add imshow() to my code. I understand that imshow is very slow, but I won't include imshow inside tic-toc commands: fps = zeros(1, 100); figure; for i=1:100 t = tic I = fetch_image_from_source(); % function to get image fps(i) = 1./ toc(t); imshow(I); drawnow; end plot(fps); And I get fps

Capture callstack and events in Xperf

阅读更多关于 Capture callstack and events in Xperf

问题 Sorry about the dumb question. I am new to Xperf. I am on 64-bit Windows 8.1 and my application is also x64. I want to capture both the callstacks and my defined events in the application using Xperf. I registered the GUID 35f7872e-9b6d-4a9b-a674-66f1edd66d5c in my application. When I was using: xperf -on PROC_THREAD+LOADER+Base -start UserSession -on 35f7872e-9b6d-4a9b-a674-66f1edd66d5c -BufferSize 1024 -stackwalk profile I can get all the events but no callstack. However if I remove -on

Is it possible to use vtune on certain code snippets in a binary and not an entire binary?

阅读更多关于 Is it possible to use vtune on certain code snippets in a binary and not an entire binary?

问题 I am adding usage of a small library to a large existing piece of software and would like to analyze (in finder detail than just in&out rdtsc() or gettimeofday calls) the overhead and it's attribution of the small library. Using things like rdtsc() I can get a sense of the latency that calling my libraries functions have, but I cannot do latency attribution unless I am also able to see whether branches are not being predicted well, caching isnt working properly, etc..I looked into PAPI as I

Can't sample hardware cache events with linux perf

阅读更多关于 Can't sample hardware cache events with linux perf

问题 For some reason, I can't sample ( perf record ) hardware cache events: # perf record -e L1-dcache-stores -a -c 100 -- sleep 5 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.607 MB perf.data (~26517 samples) ] # perf script but I can count them ( perf stat ): # perf stat -e L1-dcache-stores -a -- sleep 5 Performance counter stats for 'sleep 5': 711,781 L1-dcache-stores 5.000842990 seconds time elapsed I tried on different CPUs, OS versions (and kernel

Why does gprof significantly underestimate the program's running time?

阅读更多关于 Why does gprof significantly underestimate the program's running time?

问题 I have this program that takes 2.34 seconds to run, and gprof says it only takes 1.18 seconds. I've read answers elsewhere suggesting that gprof can get it wrong if e.g the program is I/O bound, but this program clearly isn't. This also happens for a useful program I'm trying to profile. It's not specific to this trivial test case. (Also in this case gprof says that main() takes more than 100% of the program's running time, which is a pretty stupid bug but not really causing problems for me.)

How to profile combined python and c code

阅读更多关于 How to profile combined python and c code

问题 I have an application that consists of multiple python scripts. Some of these scripts are calling C code. The application is now running much slower than it was, so I would like to profile it to see where the problem lies. Is there a tool, software package or just a way to profile such an application? A tool that will follow the python code into the C code and profile these calls as well? Note 1: I am well aware of the standard Python profiling tools. I'm specifically looking here for

Chrome timeline - how can I determine the cause of a “Recalculate Style” log entry?

阅读更多关于 Chrome timeline - how can I determine the cause of a “Recalculate Style” log entry?

问题 Profiling a page with the built-in timeline recorder in Chrome, I see repeated "Recalculate Style" entries. They have no obvious information to link them to DOM element or event. How can I best determine the cause of these entries? 回答1: My advice to you would be to use the Chrome Canary build of Chrome. Paul Irish has a good demo of using the Timeline in Chrome Dev Tools here You can simply click on the event, for instance 'Recalculate Style', and you should get a miniature stack trace

Memory profiler for .NET Compact Framework

阅读更多关于 Memory profiler for .NET Compact Framework

问题 Is there a tool I could use for profiling (memory) a .NET compact framework 3.5 application (Windows Mobile)? Thanks! 回答1: Use the Remote Performance Monitor that comes with Studio. It gives snapshots of the GC heap, traceable roots and much more. 回答2: Equatec supports .NET CF 3.5 回答3: The CLR Profiler also comes with the CF-SDK, and allows to view the heap of a process. In contrast to Remote Performance Monitor it doesn't crash all the time ;-) 来源： https://stackoverflow.com/questions/1048939

Profilers Instrumenting Vs Sampling

阅读更多关于 Profilers Instrumenting Vs Sampling

问题 I am doing a study to between profilers mainly instrumenting and sampling. I have came up with the following info: sampling: stop the execution of program, take PC and thus deduce were the program is instrumenting: add some overhead code to the program so it would increment some pointers to know the program If the above info is wrong correct me. After this I was looking at the time of execution and some said that instrumenting takes more time than sampling! is this correct? if yes why is that

Why are getitem(key) and get(key) significantly slower than [key]?

阅读更多关于 Why are __getitem__(key) and get(key) significantly slower than [key]?

问题 It was my understanding that brackets were nothing more than a wrapper for __getitem__ . Here is how I benchmarked this: First, I generated a semi-large dictionary. items = {} for i in range(1000000): items[i] = 1 Then, I used cProfile to test the following three functions: def get2(items): for k in items.iterkeys(): items.get(k) def magic3(items): for k in items.iterkeys(): items.__getitem__(k) def brackets1(items): for k in items.iterkeys(): items[k] The results looked like so: 1000004