问题
I only have some rough idea about this, so I would like to have some more practicle ideas. Ideas for Linux, Unix, and Windows are all welcome.
The rough Idea in my head is:
The profiler setup some type of timer and a timer interrupt handler in the target process. When its handler takes control, it reads and saves the value of the instruction pointer register. When the sampling is done, it counts the occurences of every IP register value, then we can know the 'top hitters' among all sampled programe addresses.
But I do not actually know how to do it. Can someone give me some basic but practicle ideas of it? For example, what kind of timer (or equivalent) is always used? How to read the IP reg value? and etc. (I think when the execution enters the profiler's handler routine, the IP should be pointing the entrence of the handler, not to somewhere in the target program, so we cannot simplu read the current IP value)
Thank you for your answer!
Thanks for the answers from Peter Cordes and Mike Dunlavey.
Peter's answer tells how to read registers and memory of other process. Now I realized that the profiler does not have to execute 'inside' the target process, instead, it just reads the target's reg/mem using ptrace(2) from outside. It even does not have to suspend the target as the ptrace would do it anyway.
Mike's answer suggests that, for performance profiling, counting the occurrences of stack trace makes more sense than counting aginst the IP register values, as the latter may give too much noise information when the execution is in system module at the moment of sampling.
Thank you guys so much!
回答1:
Good for you for wanting to do this. Advice - don't try to mimic gprof.
What you need to do is sample the call stack, not just the IP, at random or pseudo-random times.
First reason - I/O and system calls can be deeply buried in the app and be costing a large fraction of the time, during which the IP is meaningless but the stack is meaningful. ("CPU profilers" simply shut their eyes.)
Second reason - Looking at the IP is like trying to understand a horse by looking at the hairs on its tail. To analyze performance of a program you need to know why the time is spent, not just that it is. The stack tells why.
Another problem with gprof is it made people think you need lots of samples - the more the better - for statistical precision. But that assumes you're looking for needles in a haystack, the removal of which saves next to nothing - in other words you assume (attaboy/girl programmer) there's nothing big in there, like a cow under the hay. Well, I've never seen software that didn't have cows in the hay, and it doesn't take a lot of samples to find them.
How to get samples: having a timer interrupt and reading the stack (in binary) is just a technical problem. I figured out how to do it a long time ago. So can you. Every debugger does it. But to turn it into code names and locations requires a map file or something like it, which usually means a debug build (not optimized). You can get a map file from optimized code, but the optimizer has scrambled the code so it's hard to make sense of.
Is it worthwhile taking samples in non-optimized code? I think so, because there are two kinds of speedups, the ones the compiler can do, and the ones you can do but the compiler can't. The latter are the cows. So what I and many other programmers do first is performance tuning on un-optimized code using random sampling. When all the cows are out, turn on the optimizer and let the compiler do its magic.
来源:https://stackoverflow.com/questions/49182065/how-does-a-profiler-sample-a-running-programe