Function profiling woes - Visual Studio 2010 Ultimate

后端 未结 2 694
陌清茗
陌清茗 2020-12-01 22:13

I am trying to profile my application to monitor the effects of a function, both before and after refactoring. I have performed an analysis of my application and having look

2条回答
  •  野趣味
    野趣味 (楼主)
    2020-12-01 22:27

    Do you mind too much if I talk a bit about profiling, what works and what doesn't?

    Let's make up an artificial program, some of whose statements are doing work that can be optimized away - i.e. they are not really necessary. They are "bottlenecks".

    Subroutine foo runs a CPU-bound loop that takes one second. Also assume subroutine CALL and RETURN instructions take insignificant or zero time, compared to everything else.

    Subroutine bar calls foo 10 times, but 9 of those times are unnecessary, which you don't know in advance and can't tell until your attention is directed there.

    Subroutines A, B, C, ..., J are 10 subroutines, and they each call bar once.

    The top-level routine main calls each of A through J once.

    So the total call tree looks like this:

    main
      A
        bar
          foo
          foo
          ... total 10 times for 10 seconds
      B
        bar
          foo
          foo
          ...
      ...
      J
        ...
    (finished)
    

    How long does it all take? 100 seconds, obviously.

    Now let's look at profiling strategies. Stack samples (like say 1000 samples) are taken at uniform intervals.

    1. Is there any self time? Yes. foo takes 100% of the self time. It's a genuine "hot spot". Does that help you find the bottleneck? No. Because it is not in foo.

    2. What is the hot path? Well, the stack samples look like this:

      main -> A -> bar -> foo (100 samples, or 10%)
      main -> B -> bar -> foo (100 samples, or 10%)
      ...
      main -> J -> bar -> foo (100 samples, or 10%)

    There are 10 hot paths, and none of them look big enough to gain you much speedup.

    IF YOU HAPPEN TO GUESS, and IF THE PROFILER ALLOWS, you could make bar the "root" of your call tree. Then you would see this:

    bar -> foo (1000 samples, or 100%)
    

    Then you would know that foo and bar were each independently responsible for 100% of the time and therefore are places to look for optimization. You look at foo, but of course you know the problem isn't there. Then you look at bar and you see the 10 calls to foo, and you see that 9 of them are unnecessary. Problem solved.

    IF YOU DIDN'T HAPPEN TO GUESS, and instead the profiler simply showed you the percent of samples containing each routine, you would see this:

    main 100%
    bar  100%
    foo  100%
    A    10%
    B    10%
    ...
    J    10%
    

    That tells you to look at main, bar, and foo. You see that main and foo are innocent. You look at where bar calls foo and you see the problem, so it's solved.

    It's even clearer if in addition to showing you the functions, you can be shown the lines where the functions are called. That way, you can find the problem no matter how large the functions are in terms of source text.

    NOW, let's change foo so that it does sleep(oneSecond) rather than be CPU bound. How does that change things?

    What it means is it still takes 100 seconds by the wall clock, but the CPU time is zero. Sampling in a CPU-only sampler will show nothing.

    So now you are told to try instrumentation instead of sampling. Contained among all the things it tells you, it also tells you the percentages shown above, so in this case you could find the problem, assuming bar was not very big. (There may be reasons to write small functions, but should satisfying the profiler be one of them?)

    Actually, the main thing wrong with the sampler was that it can't sample during sleep (or I/O or other blocking), and it doesn't show you code line percents, only function percents.

    By the way, 1000 samples gives you nice precise-looking percents. Suppose you took fewer samples. How many do you actually need to find the bottleneck? Well, since the bottleneck is on the stack 90% of the time, if you took only 10 samples, it would be on about 9 of them, so you'd still see it. If you even took as few as 3 samples, the probability it would appear on two or more of them is 97.2%.**

    High sample rates are way overrated, when your goal is to find bottlenecks.

    Anyway, that's why I rely on random-pausing.

    ** How did I get 97.2 percent? Think of it as tossing a coin 3 times, a very unfair coin, where "1" means seeing the bottleneck. There are 8 possibilities:

           #1s  probabality
    0 0 0  0    0.1^3 * 0.9^0 = 0.001
    0 0 1  1    0.1^2 * 0.9^1 = 0.009
    0 1 0  1    0.1^2 * 0.9^1 = 0.009
    0 1 1  2    0.1^1 * 0.9^2 = 0.081
    1 0 0  1    0.1^2 * 0.9^1 = 0.009
    1 0 1  2    0.1^1 * 0.9^2 = 0.081
    1 1 0  2    0.1^1 * 0.9^2 = 0.081
    1 1 1  3    0.1^0 * 0.9^3 = 0.729
    

    so the probability of seeing it 2 or 3 times is .081*3 + .729 = .972

提交回复
热议问题