I want to use the Hardware Performance Counters that come with the Intel and AMD x86_64 multicore processors to calculate the number of retired stores by a program. I want e
The best approach will be using perf in linux as osgx mentioned, as it is part of linux kernel. But it CAN be called in the C/C++ code as well, and there is no need for it to be external perf stat calls.
Just download the kernel source code and take a look at it. Or alternatively take a look at this library I think by google:
http://perfmon2.sourceforge.net/docs_v4.html
it is part of perfmon2 project but is designed to work with perf. Take a look at perf_examples directory and you will get the idea. That is how I handle perf calls from within my C codes.