Profiling a (possibly I/O-bound) process to reduce latency

喜夏-厌秋 提交于 2019-12-03 02:51:52
Mike Dunlavey

Use this method.

It is quite simple and effective at pinpointing opportunities for optimization, whether they are in CPU or IO bound code.

If you are right that the biggest opportunities are in a particular function or module, then it will find them. If they are elsewhere, it will find them.

Of the tools you mentioned and discarded, it is most similar to poor man's profiler, but still not very similar.

EDIT: Since you say it is triggered by a user interaction and blocks further input until it completes, here's how I would do it.

First, I assume it does not block a manual interrupt signal to the debugger, because otherwise you'd have no way to stop an infinite loop. Second, I would wrap a loop of 10, 100, or 1000 times around the routine in question, so it is doing it long enough to be manually interrupted.

Now, suppose it is spending some fraction of time doing I/O, like 50%. Then when you interrupt it, you have a 50% chance of catching it in the I/O. So if you catch it in the I/O, which the call stack will tell you, you can also see in great detail where the I/O is being requested from and why.

It will show you what's going on, which is almost certainly something surprising. If you see it doing something on as few as two (2) samples that you could find a way to eliminate, then you will get a considerable speedup. In fact, if you eliminate that activity, you don't know in advance how much time you will save, but on average you can expect to save fraction F = (s+1)/(n+2), where n is the total number of samples you took, and s is the number of samples that showed the activity. (Rule of Succession) Example, if you took 4 stack samples and saw the activity on 2 of them, on average it would save you F = 3/6 = 1/2, corresponding to a speedup factor of 1/(1-F) or 2.

Once you've done that, you can do it again and find something else to fix. The speedup factors multiply together like compound interest.

Then of course you remove the outer loop and "cash in" all the speedups you got.

If you are wondering how this differs from profiling, it is that by carefully examining each stack sample, and possibly related data, you can recognize activities that you could remove, where if all you've got is measurements, you are left trying to intuit what is going on. The actual amount of time you save is what it is, regardless of any measurements. The important thing is to find the problem. No matter how precisely a profiler might measure it, if you can't find it, you're not winning. These pages are full of people saying either they don't understand what their profiler is telling them, or it seems to be saying there is nothing to fix, which they are only too willing to accept. That's a case of rose-tinted glasses.

More on all that.

For I/O bound applications you can use the --collect-systime=yes option of callgrind.

This collects time spent in system calls (in milliseconds). So if you believe you have an I/O bottleneck, you can use these stats to identify it.

Todo: check out 'perf' (again)

  • fork()
  • execxxx(process under test)
  • in the parent:
    • (in a loop) periodically call:
    • getrusage(RUSAGE_CHILDREN, ...)

getrusage() will not only gives you the cpu usage, but also the major/minor pagefaults, number of context switches,etc. The rest of the time is probably spent waiting for I/O. This won't give you detailed profiling information but a nice overall footprint of the program's behavior, comparable to running vmstat on a per-process basis.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!