OpenMP vs gcc compiler optimizations

醉酒当歌 提交于 2021-01-27 06:14:09

问题


I'm learning openmp using the example of computing the value of pi via quadature. In serial, I run the following C code:

double serial() {
    double step;
    double x,pi,sum = 0.0;

    step = 1.0 / (double) num_steps;

    for (int i = 0; i < num_steps; i++) {
        x = (i + 0.5) * step; // forward quadature
        sum += 4.0 / (1.0 + x*x);
    }
    pi = step * sum;

    return pi;
}

I'm comparing this to an omp implementation using a parallel for with reduction:

double SPMD_for_reduction() {
    double step;
    double pi,sum = 0.0;

    step = 1.0 / (double) num_steps;

    #pragma omp parallel for reduction (+:sum)
    for (int i = 0; i < num_steps; i++) {
        double x = (i + 0.5) * step;
        sum += 4.0 / (1.0 + x*x);
    }
    pi = step * sum;

    return pi;
}

For num_steps = 1,000,000,000, and 6 threads in the case of omp, I compile and time:

    double start_time = omp_get_wtime();
    serial();
    double end_time = omp_get_wtime();

    start_time = omp_get_wtime();
    SPMD_for_reduction();
    end_time = omp_get_wtime();

Using no cc compiler optimizations, the runtimes are around 4s (Serial) and .66s (omp). With the -O3 flag, serial runtime drops to ".000001s" and the omp runtime is mostly unchanged. What's going on here? Is it vector instructions being used, or is it poor code or timing method? If it's vectorization, why isn't the omp function benefiting?

It may be of interest that the machine I am using is using a modern 6 core Xeon processor.

Thanks!


回答1:


The compiler outsmarts you. For the serial version it is able to detect, that the result of your computation is never used. Therefore it throws out the computation completely.

double start_time = omp_get_wtime();
serial(); //<-- Computations not used.
double end_time = omp_get_wtime();

In the openMP case the compiler can not see if really everything inside the function body is without an effect, so to stay on the safe side it keeps the function call.

You can of course write something like double serial_pi = serial(); and outside of the time measurement do some dummy stuff with the variable serial_pi. This way the compiler will keep the function call and do the optimizations you are actually looking for.



来源:https://stackoverflow.com/questions/34323418/openmp-vs-gcc-compiler-optimizations

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!