profiling | 易学教程

Profiling a (possibly I/O-bound) process to reduce latency

阅读更多关于 Profiling a (possibly I/O-bound) process to reduce latency

问题 I want to improve the performance of a specific method inside a larger application. The goal is improving latency (wall-clock time spent in a specific function), not (neccessarily) system load. Requirements: As I expect a lot of the latency to be due to I/O, take into account time spent waiting/blocked (in other words: look at wall clock time instead of CPU time) As the program does much more than the fragment i'm trying to optimize. There needs to be a way to either start/stop profiling

Difference in performance of compiled accelerate code ran from ghci and shell

阅读更多关于 Difference in performance of compiled accelerate code ran from ghci and shell

问题 Problem Hello, I'm using accelerate library to create an application allowing the user to interactively call functions that process images, that's why I'm basing on and extending ghci using ghc api. The problem is that when running the compiled executable from the shell the computations are done under 100ms (slightly less than 80), while running the same compiled code within ghci it takes over 100ms (on average a bit more than 140) to finish. Resources sample code + execution logs: https:/

What is profiling?

阅读更多关于 What is profiling?

问题 I am new to this and is trying to learn. What is profiling? What are various free tools for profiling .NET, Java EE? Can Javascript be profiled? If so, by which tool? And lastly, how do these profilers work? 回答1: Profiling measures how long various parts of the code take to run. Javascript can be profiled with firebug: http://getfirebug.com/js.html 回答2: profiling is measuring the execution times and correlating it with various classes/methods/functions. (see the link I gave to the wikipedia

Looking for a low impact c++ profiler

阅读更多关于 Looking for a low impact c++ profiler

问题 I am looking for a low impact, os-independent profiler for c++ code. When I say low impact, I am referring to something less intrusive than valgrind. I plan to use it in a MIPS-based embeded environment (hence the os-independance) and tried a ported version of valgrind and it completely changed the performance characteristics (way too much Heisenberg principle at work) so I cant go that route. We know the memory bus speed is a bottleneck which most-likely explains why valgrind was so

How do I profile the EDT in Swing?

阅读更多关于 How do I profile the EDT in Swing?

问题 I have an application that I'm building in Swing. It has a scrollable and zoomable chart component which I can pan and zoom in. The whole thing is smooth except that sometimes the UI will pause for about 750 ms and I don't know why. This doesn't always happen - but sometimes something happens in the application and it starts pausing like this once every 6-8 seconds. It seems pretty clear that there's some event being placed on the EDT that's taking 750 ms or so to run, which shouldn't be

Haskell fast concurrent queue

阅读更多关于 Haskell fast concurrent queue

问题 The Problem Hello! I'm writing a logging library and I would love to create a logger, that would run in separate thread, while all applications threads would just send messages to it. I want to find the most performant solution for this problem. I need simple unboud queue here. Approaches I've created some tests to see how available solutions perform and I get very strange results here. I tested 4 implementations (source code provided below) based on: pipes-concurrency Control.Concurrent.Chan

Interactive Python: cannot get `%lprun` to work, although line_profiler is imported properly

阅读更多关于 Interactive Python: cannot get `%lprun` to work, although line_profiler is imported properly

问题 Problem Most iPython "magic functions" work fine for me right off the bat: %hist , %time , %prun , etc. However, I noticed that %lprun could not be found with iPython as I'd installed it originally. Attempt to Resolve I then discovered that I should install the line_profiler module. I have installed this module, but still cannot seem to get the magic function to work correctly. If I attempt to call %lprun , iPython still cannot find the function. If I call it with the full name ( line

Java heap profiling crashes with SIGABRT

阅读更多关于 Java heap profiling crashes with SIGABRT

问题 I'm trying to profile native memory allocated by C-written methods and plugged to JVM through JNI . I installed $ valgrind --version valgrind-3.13.0 And tried to run JVM with the following options: valgrind --tool=massif --massif-out-file=/tmp/massif-j.out java -XX:+UnlockDiagnosticVMOptions //... The thing is it crashes with core dump created 0x00000000080e4196: fxrstor64 (%rsp) 0x00000000080e419b: add $0x200,%rsp 0x00000000080e41a2: mov (%rsp),%r15 0x00000000080e41a6: mov 0x8(%rsp),%r14

intel Pin: analysis routine detects ah register instead of rsp (REG_STACK_PTR)

阅读更多关于 intel Pin: analysis routine detects ah register instead of rsp (REG_STACK_PTR)

问题 I asked this question few days ago. I wanted to get the stack allocation size (after the function creation). The answer suggests to do: if((INS_Opcode(ins) == XED_ICLASS_ADD || INS_Opcode(ins) == XED_ICLASS_SUB) && REG(INS_OperandReg(ins, 0)) == REG_STACK_PTR && INS_OperandIsImmediate(ins, 1) Which in theory is correct and does make sense. But, it doesn't work in practice (correct me if I'm wrong here). It works perfectly fine if I remove REG(INS_OperandReg(ins, 0)) == REG_STACK_PTR check.

perf report shows this function “__memset_avx2_unaligned_erms” has overhead. does this mean memory is unaligned?

阅读更多关于 perf report shows this function “__memset_avx2_unaligned_erms” has overhead. does this mean memory is unaligned?

问题 I am trying to profile my C++ code using perf tool. Implementation contains code with SSE/AVX/AVX2 instructions. In addition to that code is compiled with -O3 -mavx2 -march=native flags. I believe __memset_avx2_unaligned_erms function is a libc implementation of memset . perf shows that this function has considerable overhead. Function name indicates that memory is unaligned, however in the code I am explicitly aligning the memory using GCC built-in macro __attribute__((aligned (x))) What