How to read performance counters on i5, i7 CPUs

老子叫甜甜 提交于 2019-11-27 13:10:57

问题


Modern CPUs have quite a lot of performance counters - http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-system-programming-manual-325384.html how to read them? I'm interested in cache misses and branch mispredictions.


回答1:


Looks like PAPI has very clean API and works just fine on Ubuntu 11.04. Once it's installed, following app will do what I wanted:

#include <stdio.h>
#include <stdlib.h>
#include <papi.h>

#define NUM_EVENTS 4

void matmul(const double *A, const double *B,
        double *C, int m, int n, int p)
{
    int i, j, k;
    for (i = 0; i < m; ++i)
        for (j = 0; j < p; ++j) {
            double sum = 0;
            for (k = 0; k < n; ++k)
                sum += A[i*n + k] * B[k*p + j];
            C[i*p + j] = sum;
        }
}

int main(int /* argc */, char ** /* argv[] */)
{
    const int size = 300;
    double a[size][size];
    double b[size][size];
    double c[size][size];

    int event[NUM_EVENTS] = {PAPI_TOT_INS, PAPI_TOT_CYC, PAPI_BR_MSP, PAPI_L1_DCM };
    long long values[NUM_EVENTS];

    /* Start counting events */
    if (PAPI_start_counters(event, NUM_EVENTS) != PAPI_OK) {
        fprintf(stderr, "PAPI_start_counters - FAILED\n");
        exit(1);
    }

    matmul((double *)a, (double *)b, (double *)c, size, size, size);

    /* Read the counters */
    if (PAPI_read_counters(values, NUM_EVENTS) != PAPI_OK) {
        fprintf(stderr, "PAPI_read_counters - FAILED\n");
        exit(1);
    }

    printf("Total instructions: %lld\n", values[0]);
    printf("Total cycles: %lld\n", values[1]);
    printf("Instr per cycle: %2.3f\n", (double)values[0] / (double) values[1]);
    printf("Branches mispredicted: %lld\n", values[2]);
    printf("L1 Cache misses: %lld\n", values[3]);

    /* Stop counting events */
    if (PAPI_stop_counters(values, NUM_EVENTS) != PAPI_OK) {
        fprintf(stderr, "PAPI_stoped_counters - FAILED\n");
        exit(1);
    }

    return 0;
}

Tested this on Intel Q6600, it supports up to 4 performance events. Your processor may support more or less.




回答2:


What about perf? perf list hw cache shows 33 different events and the man page shows how to use raw performance counter descriptors.




回答3:


Performance counters are read with the RDPMC insn.

EDIT: To add a bit more info, reading performance counters is not very easy and it would take pages upon pages if we are to describe it here, besides it involves writes to Model Specific Registers, which require privileged instructions. I would instead advise to use ready profilers - oprofile or Intel VTune, which are built upon performance counters.




回答4:


I think there is a available library that can be used, called perfmon2, http://perfmon2.sourceforge.net/, and documentations are available at http://www.hpl.hp.com/research/linux/perfmon/perfmon.php4 and http://www.hpl.hp.com/techreports/2004/HPL-2004-200R1.html, I am recently digging this lib out, I would post example code as soon as I figure it out~



来源:https://stackoverflow.com/questions/8091182/how-to-read-performance-counters-on-i5-i7-cpus

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!