I want to use the Hardware Performance Counters that come with the Intel and AMD x86_64 multicore processors to calculate the number of retired stores by a program. I want each thread to calculate its retired stores separately. Can it be done? And if so, how in C/C++?
You can use Perfctr or PAPI if you want to count hardware events on some part of the program internally (without starting any 3rd party tool).
Perfctr quickstart: http://www.ale.csce.kyushu-u.ac.jp/~satoshi/how_to_use_perfctr.htm
PAPI homepage: http://icl.cs.utk.edu/papi/
PerfSuite good doc: http://perfsuite.ncsa.illinois.edu/publications/LJ135/x27.html
If you can do this externally, there is a perf command of modern Linux.
The best approach will be using perf in linux as osgx mentioned, as it is part of linux kernel. But it CAN be called in the C/C++ code as well, and there is no need for it to be external perf stat calls.
Just download the kernel source code and take a look at it. Or alternatively take a look at this library I think by google:
http://perfmon2.sourceforge.net/docs_v4.html
it is part of perfmon2 project but is designed to work with perf. Take a look at perf_examples directory and you will get the idea. That is how I handle perf calls from within my C codes.
The official application from AMD is named CodeAnalyst
Checked out oprofile yet?
来源:https://stackoverflow.com/questions/7107825/using-hardware-performance-counters-in-linux