Profiling instructions

独自空忆成欢 提交于 2021-02-08 06:26:13

问题


I want to count several cpu instructions in my code. e.g. I would like to know how many additions, how many multiplications, how many float operations, how many branches my code executes. I currently use gprof under Linux for profiling my c++ code but it only gives the number of calls to my functions, and I manually estimate the number of instructions. Are there any tools that might do the trick for me? Maybe some virtual machine?


回答1:


You may be able to use Valgrind's Callgrind with the --dump-instr=yes flag to achieve this




回答2:


This is a general advice, not-Linux specific: you should be interested in CPU cycles instead. Forget about the number of instructions as a measure of performance. One instructions may cost same as other 10 together, so it won't tell you anything.

You should focus on CPU cycles, and in multithreaded environments (most if not all today) in the time the thread is put to sleep ("switched/out"), which will give you the idea of how much time is waiting for I/O, DB, etc to complete and it impacts CPU privileged time.




回答3:


If you really need to count instructions then you are probably best off generating assembler and then passing the output to an intelligent grep equivalent. For gcc, try the -S switch.




回答4:


Intels vtune is free for linux users, AFAIK (assuming we're talking about an intel based x86 linux machine). It will give you all the info you need and SOOO much more.




回答5:


You can use pin-instat which is a PIN tool. To use it, you need to install PIN. However, the instruction count alone doesn't say much about performance. Cache miss, branch prediction also play big roles.

Disclaimer: I'm the author of pin-instat.




回答6:


Just out of curiosity, is instruction-count a useful way to profile code performance?

I know that back in the days of "simple" CPU designs, you could reasonably assume that each opcode would take exactly so-many-nanoseconds of CPU time to execute, but these days with all the complex memory caching schemes, on-the-fly opcode re-ordering, pipelining, superscalar architecture, and everything else that's been thrown into the modern CPU, does the simple counting of opcode executions still give one a good indication of how long the code will take to run? Or will execution time vary as much based on (for example) memory access patterns and the seequence in which opcodes are executed as it will on the raw frequency of the opcodes' execution?

My suspicion is that the only way to reliably predict code performance these days is to actually run the code on the target architecture and time it.... i.e. often when it seems like the compiler has emitted inefficient code, it's actually doing something clever that takes advantage of a subtle feature of the modern CPU architecture.



来源:https://stackoverflow.com/questions/1116213/profiling-instructions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!