Parallel exection using OpenMP takes longer than serial execution c++, am i calculating execution time in the right way?

谁说我不能喝 提交于 2019-12-10 11:42:21

问题


Without using Open MP Directives - serial execution - check screenshot here

Using OpenMp Directives - parallel execution - check screenshot here

#include "stdafx.h"
#include <omp.h>
#include <iostream>
#include <time.h>
using namespace std;

static long num_steps = 100000;
double step;
double pi;

int main()
{
clock_t tStart = clock();
int i;
double x, sum = 0.0;
step = 1.0 / (double)num_steps;

#pragma omp parallel for shared(sum)
for (i = 0; i < num_steps; i++)
{
    x = (i + 0.5)*step;
#pragma omp critical
    {
        sum += 4.0 / (1.0 + x * x);
    }
}

pi = step * sum;
cout << pi <<"\n";
printf("Time taken: %.5fs\n", (double)(clock() - tStart) / CLOCKS_PER_SEC);
getchar();
return 0;
}

I have tried multiple times, the serial execution is always faster why?

Serial Execution Time: 0.0200s Parallel Execution Time: 0.02500s

why is serial execution faster here? am I calculation the execution time in the right way?


回答1:


OpenMP internally implement multithreading for parallel processing and multi threading's performance can be measured with large volume of data. With very small volume of data you cannot measure the performance of multithreaded application. The reasons:-

a) To create a thread O/S need to allocate memory to each thread which take time (even though it is tiny bit.)

b) When you create multi threads it needs context switching which also take time.

c) Need to release memory allocated to threads which also take time.

d) It depends on number of processors and total memory (RAM) in your machine

So when you try with small operation with multi threads it's performance will be as same as a single thread (O/S by default assign one thread to every process which is call main thread). So your outcome is perfect in this case. To measure the performance of multithread architecture use large amount of data with complex operation then only you can see the differences.




回答2:


Because of your critical block you cannot sum sum in parallel. Everytime one thread reaches the critical section all other threads have to wait.

The smart approach would be to create a temporary copy of sum for each thread that can be summed without synchronization and afterwards to sum the results from the different threads. Openmp can do this automatically for with the reduction clause. So your loop will be changed to.

#pragma omp parallel for reduction(+:sum)
for (i = 0; i < num_steps; i++)
{
    x = (i + 0.5)*step;
    sum += 4.0 / (1.0 + x * x);
}

On my machine this performs 10 times faster than the version using the critical block (I also increased num_steps to reduce the influence of one-time actions like thread-creation).

PS: I recommend you you to use <chrono>, <boost/timer/timer.hpp> or google benchmark for timing your code.



来源:https://stackoverflow.com/questions/49948807/parallel-exection-using-openmp-takes-longer-than-serial-execution-c-am-i-calc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!